System Design Interview: Design a Social Graph (Friend Connections)

What Is a Social Graph?

A social graph represents relationships between users: friendships (Facebook), follow relationships (Twitter/Instagram), professional connections (LinkedIn). Core operations: add/remove edge, check if two users are connected, find mutual friends, suggest new connections (friends-of-friends), and traverse degrees of separation. LinkedIn has 1B nodes and 30B edges; Facebook’s social graph has 3B nodes.

  • Coinbase Interview Guide
  • Airbnb Interview Guide
  • Snap Interview Guide
  • Twitter Interview Guide
  • LinkedIn Interview Guide
  • Meta Interview Guide
  • Graph Representation

    Adjacency list stored in a relational DB or graph DB:

    -- Undirected (friendship):
    friendships: user_id_1, user_id_2, created_at
    -- Directed (follow):
    follows: follower_id, followee_id, created_at
    

    Indexes: (user_id_1, user_id_2) unique for O(1) connection check; (user_id_1) for O(degree) neighbor lookup; (user_id_2) for reverse lookup.

    Mutual Friends

    Mutual friends between user A and user B = intersection of A’s friends and B’s friends. Naive: SELECT friends of A, SELECT friends of B, intersect in application code. Better with SQL: SELECT friend_id FROM friendships WHERE user_id = A INTERSECT SELECT friend_id FROM friendships WHERE user_id = B. For high-degree nodes (celebrities with 10M followers): this query is expensive. Cache friend sets in Redis sorted sets; compute intersection in Redis: SINTERSTORE result friends:A friends:B. Redis SINTER on sets of size M runs in O(M * N) where M is the smaller set — fast for typical friend counts.

    Friend-of-Friend Suggestions

    Suggest users who are 2 hops away (friends of friends) and not already friends. Algorithm:

    1. Get user’s friends (1-hop neighbors): set F
    2. Get all friends of friends (2-hop): union of F[f] for all f in F
    3. Remove already-known users (user themselves + F)
    4. Rank by mutual friend count (higher = stronger suggestion)

    At scale: 2-hop traversal can explode — a user with 1000 friends, each with 1000 friends = 1M candidates before dedup. Precompute top-K suggestions per user with a weekly batch job (Spark), store results in a suggestions table. Serve from cache.

    Graph Database vs. Relational

    Relational (PostgreSQL): works well for 1-2 hop queries with good indexing. Struggles with deep traversals (6 degrees of separation). Schema: foreign keys, joins for every hop.

    Graph DB (Neo4j, Amazon Neptune): optimized for multi-hop traversal. Stores edges with direct pointers (no join needed). MATCH (a)-[:FRIEND*2]->(b) WHERE a.id=123 traverses 2 hops natively. Better for: recommendation engines, fraud ring detection, org-chart traversal.

    LinkedIn uses a custom distributed graph store (Espresso + custom graph layer). Facebook uses TAO (The Associations and Objects) — a distributed key-value store with graph semantics built on MySQL.

    High-Degree Node Problem (Celebrities)

    A celebrity with 10M followers creates challenges: storing 10M edges takes significant space, reading them for fan-out (timeline) is expensive, and mutual-friend queries become slow. Solutions:

    • Edge splitting: for “following” (asymmetric), only store follower→celebrity edges; fan-out is pull-based
    • Separate data paths: celebrity posts distributed via CDN/pub-sub rather than iterating all followers
    • Approximate mutual friends: HyperLogLog for approximate intersection count when exact is too slow

    Interview Tips

    • Friendship (undirected): store once as (min(A,B), max(A,B)) to avoid duplicate edges.
    • Connection check: O(1) with index on both users; don’t iterate all edges.
    • BFS for shortest path (6 degrees): bidirectional BFS from both ends meets in the middle — O(b^(d/2)) instead of O(b^d).
    • High-degree nodes: always ask about the distribution — if power-law, need special handling for celebrities.
    Scroll to Top