Question 1

What is the difference between fan-out on write and fan-out on read?

Accepted Answer

Fan-out on write (push): when a user posts, immediately push the post ID to all followers' feed caches. Feed reads are O(1) u2014 just read the pre-built list. Write is expensive: O(followers) writes per post. Problematic for celebrities (10M followers = 10M writes). Fan-out on read (pull): feed is assembled at read time by fetching posts from all followed users. Read is expensive: O(following) queries merged and sorted. Push is cheap. Hybrid: push for regular users, pull for celebrities (above a follower threshold, e.g., 1M).

Question 2

How do you handle the celebrity problem in a social feed system?

Accepted Answer

Define a "celebrity" threshold (e.g., >1M followers). For celebrity accounts, skip fan-out on post creation. On feed generation, for each celebrity the user follows, query the celebrity's recent posts directly (they are not in the pre-built feed cache). Merge celebrity posts with the pre-built feed using a min-heap sorted by timestamp, deduplicate, and return top-k. The cost: a small number of additional queries per feed load (users typically follow few celebrities). Celebrities' posts are cached separately by post ID for fast random access.

Question 3

How do you ensure likes are idempotent?

Accepted Answer

Store a set of user_ids per post_id (in Redis: SADD post:{id}:likes user_id returns 0 if already member, 1 if added). Like is a no-op if the user is already in the set. Unlike removes from the set (SREM). Like count is SCARD of the set. This approach handles concurrent requests safely u2014 two simultaneous likes from the same user result in exactly one like. For the database-backed version: use a unique constraint on (post_id, user_id) in the likes table; duplicate insert raises IntegrityError which the application handles as "already liked".

Question 4

How would you add an algorithmic ranking layer to the feed?

Accepted Answer

After generating the reverse-chronological candidate set (top-200 posts), score each post with a ranking model: score = recency_weight * time_decay + engagement_weight * (likes + reposts + comments) + relationship_weight * closeness_score + relevance_weight * topic_affinity. Return the top-k by score. Features come from fast stores: post engagement from Redis counters, user relationship score from a graph feature store, topic affinity from the user's recent interaction history. The ranking model runs in milliseconds (lightweight ML model or heuristic).

Question 5

How would you scale follows and the social graph to billions of users?

Accepted Answer

Store the follow graph in a distributed adjacency list: Cassandra with partition key = user_id, clustering key = followee_id (for following list) and a reverse table with partition key = followee_id (for followers list). Each row is one edge u2014 compact, O(1) follow/unfollow. For fan-out, use a message queue (Kafka): post creation emits an event, a fan-out worker reads followers from Cassandra in batches and writes to Redis feed caches. This decouples post creation latency from fan-out time. Follower counts are cached in Redis counters (INCR/DECR on follow/unfollow).

Decision	Choice	Trade-off
Feed generation	Hybrid push/pull	Fast reads for regular users; celebrities use pull to avoid 10M fan-out writes
Like storage	Set per post	O(1) toggle and lookup; in production use Redis SET or DB with unique constraint
Follow graph	In-memory sets	For LLD; production uses Graph DB (Neo4j) or adjacency list in Cassandra
Feed ordering	Reverse chronological (min-heap)	Simple; algorithmic ranking adds engagement signals

Low-Level Design: Social Media Feed (Follow, Post, Fan-out, Likes)

Core Entities

Follow/Unfollow Service

Feed Service — Fan-out on Write (Push Model)

Like Service with Idempotency

Design Trade-offs

Low-Level Design: Social Media Feed (Twitter/Instagram)

Core Entities

Follow/Unfollow Service

Feed Service — Fan-out on Write (Push Model)

Like Service with Idempotency

Design Trade-offs