System Design Interview: Design Twitter / X Timeline
Designing Twitter’s home timeline is one of the most-cited system design interview problems. It exemplifies the extreme read/write asymmetry of social networks and forces you to deeply think through fanout trade-offs, caching strategies, and the architectural decisions that set large-scale systems apart.
Functional Requirements
- Post tweets (text, images, videos, polls)
- Follow / unfollow users
- Home timeline: chronological or ranked feed of tweets from followed accounts
- User timeline: all tweets from a specific user
- Search tweets by keyword or hashtag
- Like, retweet, and reply
- Trending topics
Non-Functional Requirements
- 400M+ monthly active users; ~500M tweets posted per day
- Read-heavy: timeline reads are 100x more frequent than tweet writes
- Timeline latency: <200ms p99
- Tweet fanout: some users follow 100K+ accounts; some have 100M+ followers
- Availability over consistency: eventual consistency acceptable for timelines
Tweet Storage
-- Tweets table (MySQL/PostgreSQL, sharded by user_id)
CREATE TABLE tweets (
tweet_id BIGINT PRIMARY KEY, -- Snowflake ID (time-ordered)
user_id BIGINT NOT NULL,
content VARCHAR(280),
media_keys JSON, -- S3 keys for attached media
reply_to BIGINT, -- tweet_id of parent tweet
retweet_of BIGINT, -- tweet_id of original tweet
like_count BIGINT DEFAULT 0,
retweet_count BIGINT DEFAULT 0,
created_at TIMESTAMP,
INDEX (user_id, tweet_id DESC) -- Efficient user timeline queries
);
-- Social graph (who follows whom) — Cassandra or separate MySQL shard
CREATE TABLE follows (
follower_id BIGINT,
followee_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (follower_id, followee_id)
);
CREATE TABLE followers (
followee_id BIGINT,
follower_id BIGINT,
PRIMARY KEY (followee_id, follower_id)
);
Timeline Generation: The Core Challenge
The fundamental question: when User A loads their timeline, how do you efficiently retrieve and rank tweets from all accounts A follows?
Approach 1: Fan-out on Write (Push Model)
class FanoutOnWriteService:
"""
When a tweet is posted, push it to all followers' timeline caches immediately.
Read: O(1) — just read pre-computed timeline from Redis.
Write: O(followers) — expensive for users with millions of followers.
"""
def post_tweet(self, user_id: int, content: str) -> int:
tweet_id = generate_snowflake_id()
# 1. Persist tweet
tweet = {'tweet_id': tweet_id, 'user_id': user_id,
'content': content, 'created_at': time.time()}
tweet_db.insert(tweet)
# 2. Async fan-out to all followers' caches
fanout_queue.publish({'tweet_id': tweet_id, 'author_id': user_id})
return tweet_id
def process_fanout(self, tweet_id: int, author_id: int):
"""Worker: pushes tweet to each follower's timeline."""
followers = social_graph.get_followers(author_id) # Could be 100M
BATCH_SIZE = 1000
for i in range(0, len(followers), BATCH_SIZE):
batch = followers[i:i + BATCH_SIZE]
pipeline = redis.pipeline()
for follower_id in batch:
key = f"timeline:{follower_id}"
pipeline.zadd(key, {tweet_id: time.time()})
pipeline.zremrangebyrank(key, 0, -801) # Keep 800 most recent
pipeline.execute()
def get_timeline(self, user_id: int, count: int = 20) -> list:
"""O(1) read — just fetch from Redis sorted set."""
tweet_ids = redis.zrevrange(f"timeline:{user_id}", 0, count - 1)
return tweet_db.get_by_ids(tweet_ids)
Approach 2: Fan-out on Read (Pull Model)
class FanoutOnReadService:
"""
When timeline is requested, query all followed accounts and merge.
Read: O(following_count) — slow for users following many accounts.
Write: O(1) — just persist the tweet.
Good for: low-follower-count authors; systems where write cost matters more.
"""
def get_timeline(self, user_id: int, count: int = 20) -> list:
following = social_graph.get_following(user_id)
# Fetch recent tweets from each followed account in parallel
all_tweets = []
with ThreadPoolExecutor(max_workers=20) as executor:
futures = {
executor.submit(tweet_db.get_user_tweets, uid, limit=50): uid
for uid in following
}
for future in as_completed(futures):
all_tweets.extend(future.result())
# Sort by time, take top N
all_tweets.sort(key=lambda t: t['tweet_id'], reverse=True)
return all_tweets[:count]
Approach 3: Hybrid (Twitter’s Actual Strategy)
class HybridTimelineService:
"""
Twitter uses hybrid: push for regular users, pull for celebrities.
- Regular users (= ~10K followers): excluded from fan-out; pulled on read
- Timeline = pre-computed (regular users) + on-demand pull (celebrities)
"""
CELEBRITY_THRESHOLD = 10_000
def post_tweet(self, user_id: int, content: str) -> int:
tweet_id = self._persist_tweet(user_id, content)
follower_count = social_graph.get_follower_count(user_id)
if follower_count list:
# Step 1: Pre-computed timeline (regular users the person follows)
cached_ids = redis.zrevrange(f"timeline:{user_id}", 0, count * 2)
# Step 2: Find celebrities the user follows
following = social_graph.get_following(user_id)
celebrities = [
uid for uid in following
if social_graph.get_follower_count(uid) >= self.CELEBRITY_THRESHOLD
]
# Step 3: Pull celebrity tweets on-demand
celeb_tweets = []
for celeb_id in celebrities[:50]: # Cap to avoid latency spike
celeb_tweets.extend(tweet_db.get_user_tweets(celeb_id, limit=10))
# Step 4: Merge and re-rank
regular_tweets = tweet_db.get_by_ids(cached_ids)
all_tweets = regular_tweets + celeb_tweets
all_tweets.sort(key=lambda t: t['tweet_id'], reverse=True)
# Step 5: Apply ranking model (engagement signals, freshness, relevance)
return self.ranking_model.rank(all_tweets[:count * 3], user_id)[:count]
Snowflake ID Generation
class SnowflakeIDGenerator:
"""
Twitter's Snowflake: 64-bit time-ordered unique ID.
Layout: [41-bit timestamp ms][10-bit machine ID][12-bit sequence]
Generates ~4096 IDs/ms per machine. Sortable by creation time.
"""
EPOCH = 1288834974657 # Twitter epoch: Nov 04, 2010
MACHINE_ID_BITS = 10
SEQUENCE_BITS = 12
MAX_SEQUENCE = (1 << SEQUENCE_BITS) - 1 # 4095
def __init__(self, machine_id: int):
self.machine_id = machine_id & ((1 < int:
with self._lock:
timestamp = int(time.time() * 1000) - self.EPOCH
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
if self.sequence == 0:
# Sequence exhausted; wait for next millisecond
while timestamp <= self.last_timestamp:
timestamp = int(time.time() * 1000) - self.EPOCH
else:
self.sequence = 0
self.last_timestamp = timestamp
return ((timestamp << 22) |
(self.machine_id << 12) |
self.sequence)
Trending Topics
class TrendingService:
"""
Count hashtag mentions in a sliding window.
Count-Min Sketch for memory-efficient approximate counts.
"""
WINDOW_SECONDS = 3600 # 1-hour trending window
def record_hashtag(self, hashtag: str):
# Redis sorted set: score = count, member = hashtag
key = f"trending:{int(time.time() // 300)}" # 5-min buckets
redis.zincrby(key, 1, hashtag.lower())
redis.expire(key, self.WINDOW_SECONDS * 2)
def get_trending(self, limit: int = 10, geo: str = "global") -> list:
# Aggregate across last N buckets
now_bucket = int(time.time() // 300)
buckets = [f"trending:{now_bucket - i}" for i in range(12)] # Last hour
# ZUNIONSTORE to combine all buckets
dest_key = f"trending:merged:{now_bucket}"
redis.zunionstore(dest_key, buckets)
redis.expire(dest_key, 300)
return redis.zrevrange(dest_key, 0, limit - 1, withscores=True)
Interview Discussion Points
- Timeline ranking: Pure reverse chronological → engagement-weighted → ML ranking (Twitter’s algorithm is open-sourced; uses 48M parameter neural network)
- Cache eviction: User timelines in Redis with fixed-size window (keep last 800 tweet IDs). For users who haven’t logged in for 30 days, evict and recompute on demand.
- Tweet deletion: Soft delete in DB; background job removes tweet_id from all cached timelines. Hard problem at scale — may take minutes to fully propagate.
- Retweets: Store retweet as a new tweet record with retweet_of field. Fan-out same as original tweet. Deduplicate in timeline (don’t show original + retweet).
- Lists / filtered timelines: Same fan-out but to a separate timeline key per list. Only fan-out to users where the author is in that list.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the fan-out problem in Twitter timeline design?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Fan-out refers to the work needed to deliver a tweet to all followers. Fan-out on write (push model): when a tweet is posted, immediately write it to every follower’s timeline cache. Pros: O(1) reads (timeline is pre-computed). Cons: high write amplification for accounts with millions of followers — a celebrity tweet triggers millions of cache writes. Fan-out on read (pull model): store tweets by author only; at read time, merge tweets from all followed accounts. Pros: O(1) writes. Cons: slow read for users who follow thousands of accounts. Twitter uses a hybrid: fan-out on write for regular users, fan-out on read for celebrities above a follower threshold (~10,000).”}},{“@type”:”Question”,”name”:”How does Twitter generate unique tweet IDs at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Twitter uses Snowflake IDs: 64-bit integers composed of a 41-bit timestamp (milliseconds since epoch), 10-bit machine/datacenter ID, and 12-bit sequence number. This allows 4096 IDs per millisecond per machine. Key properties: (1) Time-sortable — IDs increase monotonically over time, enabling efficient range queries and cursor-based pagination without offset; (2) Globally unique — machine ID prevents collisions across servers; (3) No coordination — each server generates IDs independently without a central lock; (4) Fits in a 64-bit integer — efficient storage and indexing in both SQL and NoSQL systems.”}},{“@type”:”Question”,”name”:”How do you implement trending topics on Twitter?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use Redis sorted sets with time-bucketed keys. Each 5-minute window gets its own key (e.g., trends:20240115:1430). When a tweet mentions a hashtag or keyword, run ZINCRBY on the current bucket key. To compute trends for “last hour,” use ZUNIONSTORE to merge the last 12 five-minute buckets, then ZREVRANGE to get top-K topics. Exponential decay weights recent buckets more heavily: weight the most recent bucket 1.0, dropping by 0.8 per bucket. This gives trending (velocity) rather than popular (volume): a topic suddenly mentioned 10x more scores higher than one that’s always mentioned frequently.”}},{“@type”:”Question”,”name”:”How do you design cursor-based pagination for Twitter timelines?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use tweet Snowflake IDs as cursors instead of numeric offsets. To fetch the next page: pass max_id (the smallest ID from the previous page). The query becomes: SELECT * FROM tweets WHERE author_id IN (…) AND tweet_id < max_id ORDER BY tweet_id DESC LIMIT 20. Since Snowflake IDs are time-sorted, this efficiently skips already-seen tweets using an index seek, not a full scan. Compared to OFFSET pagination: offset requires scanning and discarding N rows on each page flip — O(N) per request. Cursor pagination is O(1) per page regardless of how deep into the feed you are."}}]}
🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale
🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
🏢 Asked at: Snap Interview Guide
🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale
🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Chaos Engineering