What Is a Social Media Feed System?
A social media feed aggregates and ranks posts from followed users and surfaces them in a personalized order. Examples: Twitter/X Home Timeline, Instagram Feed, LinkedIn Feed. Core challenges: fanout at scale (a post by a celebrity with 50M followers must populate 50M feeds), ranking (relevance over chronological order), and low read latency (<100ms to load the feed).
System Requirements
Functional
- User posts content; followers see it in their feed
- Feed is ranked by relevance, not just chronology
- Pagination: infinite scroll with cursor-based pagination
- Real-time updates for followed users’ new posts
- 10M DAU, 100K posts/second at peak
Fanout Strategies
Fanout on Write (Push Model)
When a user posts, immediately write the post_id to each follower’s feed cache. Fast reads (pre-computed). Expensive writes for users with many followers. Works well for “regular” users (<10K followers).
def on_post_created(post_id, author_id):
followers = get_followers(author_id)
for follower_id in followers:
redis.lpush(f'feed:{follower_id}', post_id)
redis.ltrim(f'feed:{follower_id}', 0, 999) # keep 1000 posts
Fanout on Read (Pull Model)
When a user loads their feed, fetch the latest posts from each followed user and merge. Expensive reads. Works well for celebrities/influencers (>10K followers) — avoids writing to millions of feeds.
Hybrid Model (Production Standard)
Use push for regular users, pull for celebrities. Threshold: if followee has >10K followers, skip push; compute on read. At read time: load pre-computed feed from Redis + fetch recent posts from followed celebrities + merge and re-rank. This is how Twitter and Instagram actually work.
Feed Ranking
Chronological is simple but suboptimal. Ranked feeds use ML models. Features used:
- Post signals: age, engagement rate (likes/views), media type (video ranks higher)
- Author signals: closeness to viewer (interaction history), account age
- Viewer signals: historical engagement with similar content, time of day
Ranking pipeline: candidate retrieval (top 500 posts from pool) → lightweight scoring model (logistic regression, O(1ms)) → top 100 → heavy ranking model (neural net) → top 20 → diversity filter (avoid same author twice in a row) → final feed.
Feed Storage
posts: id, author_id, content, media_url, created_at, like_count, comment_count
follows: follower_id, followee_id, created_at
feed_cache: Redis sorted set per user, keyed by score (relevance * timestamp)
Read Path
- Read from Redis: ZREVRANGE feed:{user_id} 0 19 WITHSCORES — top 20 posts by score
- Fetch post content: Redis hash or Cassandra lookup by post_id
- Fetch engagement counts: Redis counters (updated in real-time)
- Return to client with cursor for next page
Cursor-Based Pagination
Avoid offset pagination (LIMIT 20 OFFSET 100) — new posts inserted between page 1 and page 2 shifts everything. Use cursor: the last_seen_post_id or last_score. “Give me 20 posts with score < cursor_score." The cursor is returned to the client and sent back on the next request.
Feed Cache TTL and Eviction
Feed caches for inactive users waste memory. TTL: if user has not logged in for 7 days, let feed cache expire. On next login, generate the feed from scratch (cold start): pull latest 200 posts from followed users and rank them. Pre-warm on login for returning inactive users (triggered when session is created).
Interview Tips
- Hybrid push/pull with the 10K follower threshold is the key insight.
- Cursor-based pagination prevents duplicate posts across pages.
- Two-stage ranking (lightweight then heavy) balances latency and quality.
- Separate the fanout service from the post service — fanout is asynchronous.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the hybrid push-pull model for social media feed fanout?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “The hybrid model applies push (fanout on write) for regular users and pull (fanout on read) for high-follower users, typically above a threshold of 10K-100K followers. For regular users: when they post, their post_id is pushed to each follower's feed cache in Redis. Feed reads are instant — the cache is pre-computed. For celebrities (50M followers): pushing to 50M Redis keys on every post is expensive and slow. Instead, skip the push entirely. At read time: load the user's pre-computed feed from regular users they follow, then separately fetch the most recent N posts from each celebrity they follow, merge, and re-rank. The number of celebrities a user follows is typically small (<10), so the per-read fetches are bounded. This is why your Instagram feed loads fast: most of it was pre-computed, and the few celebrity posts you follow are fetched on demand. The threshold is tunable — 10K, 100K, or 1M depending on write/read cost tradeoff for the specific platform.” }
},
{
“@type”: “Question”,
“name”: “How does cursor-based pagination prevent duplicate posts in an infinite scroll feed?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Offset-based pagination (LIMIT 20 OFFSET 40) has a critical flaw: if a new post is inserted between your first and second page request, every post shifts by one position, and you see a duplicate (the last post of page 1 reappears as the first post of page 2). Cursor-based pagination uses a stable reference point. The server returns a cursor with each page: the score or timestamp of the last item delivered. The next request says "give me 20 items with score < cursor." New posts have higher scores/newer timestamps and appear before the cursor — they show up at the top on pull-to-refresh, not in the middle of the feed. The cursor is typically an opaque token (base64-encoded score + timestamp) that the client sends back. Database implementation: "WHERE score < ? ORDER BY score DESC LIMIT 20" using the cursor as the WHERE bound. This is stable regardless of concurrent inserts.” }
},
{
“@type”: “Question”,
“name”: “How do you implement real-time feed updates without polling?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Real-time feed updates (new posts appearing in your feed as they happen) require a push mechanism from server to client. Options: WebSocket (bidirectional, persistent TCP connection — used by Twitter), Server-Sent Events (one-way server push over HTTP — simpler, auto-reconnects), or long polling (fallback for environments blocking WebSocket). Architecture: when a new post is created and fanout completes (pushed to follower feed caches in Redis), the notification service publishes a "new feed item" event to a Redis pub/sub channel keyed by user_id. Connection servers (one per ~50K concurrent connections) subscribe to these channels. On receiving a pub/sub message, the connection server pushes a "new post available" signal to the connected client. The client either appends the post directly or shows a "N new posts" banner. For mobile clients: use APNs/FCM push notification to wake the app, then the app fetches the new feed item via REST. WebSocket for web clients; APNs/FCM for mobile is the standard pattern.” }
}
]
}