System Design Interview: Design Twitter / X Timeline

System Design Interview: Design Twitter / X Timeline

Designing Twitter’s home timeline is one of the most-cited system design interview problems. It exemplifies the extreme read/write asymmetry of social networks and forces you to deeply think through fanout trade-offs, caching strategies, and the architectural decisions that set large-scale systems apart.

Functional Requirements

  • Post tweets (text, images, videos, polls)
  • Follow / unfollow users
  • Home timeline: chronological or ranked feed of tweets from followed accounts
  • User timeline: all tweets from a specific user
  • Search tweets by keyword or hashtag
  • Like, retweet, and reply
  • Trending topics

Non-Functional Requirements

  • 400M+ monthly active users; ~500M tweets posted per day
  • Read-heavy: timeline reads are 100x more frequent than tweet writes
  • Timeline latency: <200ms p99
  • Tweet fanout: some users follow 100K+ accounts; some have 100M+ followers
  • Availability over consistency: eventual consistency acceptable for timelines

Tweet Storage

-- Tweets table (MySQL/PostgreSQL, sharded by user_id)
CREATE TABLE tweets (
    tweet_id    BIGINT PRIMARY KEY,  -- Snowflake ID (time-ordered)
    user_id     BIGINT NOT NULL,
    content     VARCHAR(280),
    media_keys  JSON,               -- S3 keys for attached media
    reply_to    BIGINT,             -- tweet_id of parent tweet
    retweet_of  BIGINT,             -- tweet_id of original tweet
    like_count  BIGINT DEFAULT 0,
    retweet_count BIGINT DEFAULT 0,
    created_at  TIMESTAMP,
    INDEX (user_id, tweet_id DESC)  -- Efficient user timeline queries
);

-- Social graph (who follows whom) — Cassandra or separate MySQL shard
CREATE TABLE follows (
    follower_id  BIGINT,
    followee_id  BIGINT,
    created_at   TIMESTAMP,
    PRIMARY KEY  (follower_id, followee_id)
);
CREATE TABLE followers (
    followee_id  BIGINT,
    follower_id  BIGINT,
    PRIMARY KEY  (followee_id, follower_id)
);

Timeline Generation: The Core Challenge

The fundamental question: when User A loads their timeline, how do you efficiently retrieve and rank tweets from all accounts A follows?

Approach 1: Fan-out on Write (Push Model)

class FanoutOnWriteService:
    """
    When a tweet is posted, push it to all followers' timeline caches immediately.
    Read: O(1) — just read pre-computed timeline from Redis.
    Write: O(followers) — expensive for users with millions of followers.
    """
    def post_tweet(self, user_id: int, content: str) -> int:
        tweet_id = generate_snowflake_id()

        # 1. Persist tweet
        tweet = {'tweet_id': tweet_id, 'user_id': user_id,
                 'content': content, 'created_at': time.time()}
        tweet_db.insert(tweet)

        # 2. Async fan-out to all followers' caches
        fanout_queue.publish({'tweet_id': tweet_id, 'author_id': user_id})
        return tweet_id

    def process_fanout(self, tweet_id: int, author_id: int):
        """Worker: pushes tweet to each follower's timeline."""
        followers = social_graph.get_followers(author_id)  # Could be 100M

        BATCH_SIZE = 1000
        for i in range(0, len(followers), BATCH_SIZE):
            batch = followers[i:i + BATCH_SIZE]
            pipeline = redis.pipeline()
            for follower_id in batch:
                key = f"timeline:{follower_id}"
                pipeline.zadd(key, {tweet_id: time.time()})
                pipeline.zremrangebyrank(key, 0, -801)  # Keep 800 most recent
            pipeline.execute()

    def get_timeline(self, user_id: int, count: int = 20) -> list:
        """O(1) read — just fetch from Redis sorted set."""
        tweet_ids = redis.zrevrange(f"timeline:{user_id}", 0, count - 1)
        return tweet_db.get_by_ids(tweet_ids)

Approach 2: Fan-out on Read (Pull Model)

class FanoutOnReadService:
    """
    When timeline is requested, query all followed accounts and merge.
    Read: O(following_count) — slow for users following many accounts.
    Write: O(1) — just persist the tweet.
    Good for: low-follower-count authors; systems where write cost matters more.
    """
    def get_timeline(self, user_id: int, count: int = 20) -> list:
        following = social_graph.get_following(user_id)

        # Fetch recent tweets from each followed account in parallel
        all_tweets = []
        with ThreadPoolExecutor(max_workers=20) as executor:
            futures = {
                executor.submit(tweet_db.get_user_tweets, uid, limit=50): uid
                for uid in following
            }
            for future in as_completed(futures):
                all_tweets.extend(future.result())

        # Sort by time, take top N
        all_tweets.sort(key=lambda t: t['tweet_id'], reverse=True)
        return all_tweets[:count]

Approach 3: Hybrid (Twitter’s Actual Strategy)

class HybridTimelineService:
    """
    Twitter uses hybrid: push for regular users, pull for celebrities.
    - Regular users (= ~10K followers): excluded from fan-out; pulled on read
    - Timeline = pre-computed (regular users) + on-demand pull (celebrities)
    """
    CELEBRITY_THRESHOLD = 10_000

    def post_tweet(self, user_id: int, content: str) -> int:
        tweet_id = self._persist_tweet(user_id, content)
        follower_count = social_graph.get_follower_count(user_id)

        if follower_count  list:
        # Step 1: Pre-computed timeline (regular users the person follows)
        cached_ids = redis.zrevrange(f"timeline:{user_id}", 0, count * 2)

        # Step 2: Find celebrities the user follows
        following = social_graph.get_following(user_id)
        celebrities = [
            uid for uid in following
            if social_graph.get_follower_count(uid) >= self.CELEBRITY_THRESHOLD
        ]

        # Step 3: Pull celebrity tweets on-demand
        celeb_tweets = []
        for celeb_id in celebrities[:50]:  # Cap to avoid latency spike
            celeb_tweets.extend(tweet_db.get_user_tweets(celeb_id, limit=10))

        # Step 4: Merge and re-rank
        regular_tweets = tweet_db.get_by_ids(cached_ids)
        all_tweets = regular_tweets + celeb_tweets
        all_tweets.sort(key=lambda t: t['tweet_id'], reverse=True)

        # Step 5: Apply ranking model (engagement signals, freshness, relevance)
        return self.ranking_model.rank(all_tweets[:count * 3], user_id)[:count]

Snowflake ID Generation

class SnowflakeIDGenerator:
    """
    Twitter's Snowflake: 64-bit time-ordered unique ID.
    Layout: [41-bit timestamp ms][10-bit machine ID][12-bit sequence]
    Generates ~4096 IDs/ms per machine. Sortable by creation time.
    """
    EPOCH = 1288834974657  # Twitter epoch: Nov 04, 2010
    MACHINE_ID_BITS = 10
    SEQUENCE_BITS = 12
    MAX_SEQUENCE = (1 << SEQUENCE_BITS) - 1  # 4095

    def __init__(self, machine_id: int):
        self.machine_id = machine_id & ((1 < int:
        with self._lock:
            timestamp = int(time.time() * 1000) - self.EPOCH
            if timestamp == self.last_timestamp:
                self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
                if self.sequence == 0:
                    # Sequence exhausted; wait for next millisecond
                    while timestamp <= self.last_timestamp:
                        timestamp = int(time.time() * 1000) - self.EPOCH
            else:
                self.sequence = 0
            self.last_timestamp = timestamp
            return ((timestamp << 22) |
                    (self.machine_id << 12) |
                    self.sequence)

Trending Topics

class TrendingService:
    """
    Count hashtag mentions in a sliding window.
    Count-Min Sketch for memory-efficient approximate counts.
    """
    WINDOW_SECONDS = 3600  # 1-hour trending window

    def record_hashtag(self, hashtag: str):
        # Redis sorted set: score = count, member = hashtag
        key = f"trending:{int(time.time() // 300)}"  # 5-min buckets
        redis.zincrby(key, 1, hashtag.lower())
        redis.expire(key, self.WINDOW_SECONDS * 2)

    def get_trending(self, limit: int = 10, geo: str = "global") -> list:
        # Aggregate across last N buckets
        now_bucket = int(time.time() // 300)
        buckets = [f"trending:{now_bucket - i}" for i in range(12)]  # Last hour

        # ZUNIONSTORE to combine all buckets
        dest_key = f"trending:merged:{now_bucket}"
        redis.zunionstore(dest_key, buckets)
        redis.expire(dest_key, 300)

        return redis.zrevrange(dest_key, 0, limit - 1, withscores=True)

Interview Discussion Points

  • Timeline ranking: Pure reverse chronological → engagement-weighted → ML ranking (Twitter’s algorithm is open-sourced; uses 48M parameter neural network)
  • Cache eviction: User timelines in Redis with fixed-size window (keep last 800 tweet IDs). For users who haven’t logged in for 30 days, evict and recompute on demand.
  • Tweet deletion: Soft delete in DB; background job removes tweet_id from all cached timelines. Hard problem at scale — may take minutes to fully propagate.
  • Retweets: Store retweet as a new tweet record with retweet_of field. Fan-out same as original tweet. Deduplicate in timeline (don’t show original + retweet).
  • Lists / filtered timelines: Same fan-out but to a separate timeline key per list. Only fan-out to users where the author is in that list.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the fan-out problem in Twitter timeline design?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Fan-out refers to the work needed to deliver a tweet to all followers. Fan-out on write (push model): when a tweet is posted, immediately write it to every follower’s timeline cache. Pros: O(1) reads (timeline is pre-computed). Cons: high write amplification for accounts with millions of followers — a celebrity tweet triggers millions of cache writes. Fan-out on read (pull model): store tweets by author only; at read time, merge tweets from all followed accounts. Pros: O(1) writes. Cons: slow read for users who follow thousands of accounts. Twitter uses a hybrid: fan-out on write for regular users, fan-out on read for celebrities above a follower threshold (~10,000).”}},{“@type”:”Question”,”name”:”How does Twitter generate unique tweet IDs at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Twitter uses Snowflake IDs: 64-bit integers composed of a 41-bit timestamp (milliseconds since epoch), 10-bit machine/datacenter ID, and 12-bit sequence number. This allows 4096 IDs per millisecond per machine. Key properties: (1) Time-sortable — IDs increase monotonically over time, enabling efficient range queries and cursor-based pagination without offset; (2) Globally unique — machine ID prevents collisions across servers; (3) No coordination — each server generates IDs independently without a central lock; (4) Fits in a 64-bit integer — efficient storage and indexing in both SQL and NoSQL systems.”}},{“@type”:”Question”,”name”:”How do you implement trending topics on Twitter?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use Redis sorted sets with time-bucketed keys. Each 5-minute window gets its own key (e.g., trends:20240115:1430). When a tweet mentions a hashtag or keyword, run ZINCRBY on the current bucket key. To compute trends for “last hour,” use ZUNIONSTORE to merge the last 12 five-minute buckets, then ZREVRANGE to get top-K topics. Exponential decay weights recent buckets more heavily: weight the most recent bucket 1.0, dropping by 0.8 per bucket. This gives trending (velocity) rather than popular (volume): a topic suddenly mentioned 10x more scores higher than one that’s always mentioned frequently.”}},{“@type”:”Question”,”name”:”How do you design cursor-based pagination for Twitter timelines?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use tweet Snowflake IDs as cursors instead of numeric offsets. To fetch the next page: pass max_id (the smallest ID from the previous page). The query becomes: SELECT * FROM tweets WHERE author_id IN (…) AND tweet_id < max_id ORDER BY tweet_id DESC LIMIT 20. Since Snowflake IDs are time-sorted, this efficiently skips already-seen tweets using an index seek, not a full scan. Compared to OFFSET pagination: offset requires scanning and discarding N rows on each page flip — O(N) per request. Cursor pagination is O(1) per page regardless of how deep into the feed you are."}}]}

🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

🏢 Asked at: Snap Interview Guide

🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Chaos Engineering

Scroll to Top