What Is a Social Media Feed System?
A social media feed aggregates and ranks posts from followed users and surfaces them in a personalized order. Examples: Twitter/X Home Timeline, Instagram Feed, LinkedIn Feed. Core challenges: fanout at scale (a post by a celebrity with 50M followers must populate 50M feeds), ranking (relevance over chronological order), and low read latency (<100ms to load the feed).
System Requirements
Functional
- User posts content; followers see it in their feed
- Feed is ranked by relevance, not just chronology
- Pagination: infinite scroll with cursor-based pagination
- Real-time updates for followed users’ new posts
- 10M DAU, 100K posts/second at peak
Fanout Strategies
Fanout on Write (Push Model)
When a user posts, immediately write the post_id to each follower’s feed cache. Fast reads (pre-computed). Expensive writes for users with many followers. Works well for “regular” users (<10K followers).
def on_post_created(post_id, author_id):
followers = get_followers(author_id)
for follower_id in followers:
redis.lpush(f'feed:{follower_id}', post_id)
redis.ltrim(f'feed:{follower_id}', 0, 999) # keep 1000 posts
Fanout on Read (Pull Model)
When a user loads their feed, fetch the latest posts from each followed user and merge. Expensive reads. Works well for celebrities/influencers (>10K followers) — avoids writing to millions of feeds.
Hybrid Model (Production Standard)
Use push for regular users, pull for celebrities. Threshold: if followee has >10K followers, skip push; compute on read. At read time: load pre-computed feed from Redis + fetch recent posts from followed celebrities + merge and re-rank. This is how Twitter and Instagram actually work.
Feed Ranking
Chronological is simple but suboptimal. Ranked feeds use ML models. Features used:
- Post signals: age, engagement rate (likes/views), media type (video ranks higher)
- Author signals: closeness to viewer (interaction history), account age
- Viewer signals: historical engagement with similar content, time of day
Ranking pipeline: candidate retrieval (top 500 posts from pool) → lightweight scoring model (logistic regression, O(1ms)) → top 100 → heavy ranking model (neural net) → top 20 → diversity filter (avoid same author twice in a row) → final feed.
Feed Storage
posts: id, author_id, content, media_url, created_at, like_count, comment_count
follows: follower_id, followee_id, created_at
feed_cache: Redis sorted set per user, keyed by score (relevance * timestamp)
Read Path
- Read from Redis: ZREVRANGE feed:{user_id} 0 19 WITHSCORES — top 20 posts by score
- Fetch post content: Redis hash or Cassandra lookup by post_id
- Fetch engagement counts: Redis counters (updated in real-time)
- Return to client with cursor for next page
Cursor-Based Pagination
Avoid offset pagination (LIMIT 20 OFFSET 100) — new posts inserted between page 1 and page 2 shifts everything. Use cursor: the last_seen_post_id or last_score. “Give me 20 posts with score < cursor_score." The cursor is returned to the client and sent back on the next request.
Feed Cache TTL and Eviction
Feed caches for inactive users waste memory. TTL: if user has not logged in for 7 days, let feed cache expire. On next login, generate the feed from scratch (cold start): pull latest 200 posts from followed users and rank them. Pre-warm on login for returning inactive users (triggered when session is created).
Interview Tips
- Hybrid push/pull with the 10K follower threshold is the key insight.
- Cursor-based pagination prevents duplicate posts across pages.
- Two-stage ranking (lightweight then heavy) balances latency and quality.
- Separate the fanout service from the post service — fanout is asynchronous.