System Design Interview: Design Instagram / Photo Sharing Platform

System Design Interview: Design Instagram / Photo Sharing Platform

Designing Instagram tests your knowledge of media storage, feed generation, CDN architecture, and massive read/write asymmetry. It’s a top-10 system design interview question at companies like Meta, Snap, Twitter, and Pinterest.

Functional Requirements

  • Upload and share photos and short videos
  • Follow/unfollow other users
  • Generate a personalized home feed of posts from followed accounts
  • Like and comment on posts
  • Search users and hashtags
  • Send and receive direct messages

Non-Functional Requirements

  • 500M daily active users; 100M photos uploaded per day
  • 2B feed reads per day; reads >> writes (~200x ratio)
  • Photo upload latency: <2 seconds for processing
  • Feed generation: <500ms p99
  • High availability (99.99%); eventual consistency acceptable for feeds

Capacity Estimation

  • Photos: 100M/day × 500KB avg = 50TB/day → ~18PB/year
  • Photo metadata: 100M/day × 100B = 10GB/day
  • Feed reads: 2B/day → ~23,000 QPS average, ~100K QPS peak
  • Photo uploads: 100M/day → ~1,150 QPS

High-Level Architecture

         [Mobile/Web Clients]
               │
          [API Gateway / Load Balancer]
               │
    ┌──────────┼──────────────┐
    │          │              │
[Upload    [Feed API]    [Social API]
 Service]      │         (follow/unfi)
    │          │              │
    │     [Feed Cache]   [Social Graph]
    │      (Redis)        (Cassandra)
    │          │
[S3/GCS]  [Feed Generator]
 (raw)         │
    │     [Timeline DB]
[CDN]      (Redis/Cassandra)
(resized)      │
         [Post Service]
         (metadata DB)

Photo Upload Pipeline

# Upload flow:
# 1. Client requests signed URL from Upload Service
# 2. Client uploads directly to S3 (bypasses app servers)
# 3. S3 triggers notification to processing queue
# 4. Media Processor resizes (thumbnail, standard, high-res)
# 5. CDN edge nodes pull from S3 on first request

class UploadService:
    def get_upload_url(self, user_id: str, content_type: str) -> dict:
        photo_id = generate_snowflake_id()
        key = f"uploads/{user_id}/{photo_id}/original"
        signed_url = s3.generate_presigned_post(
            Bucket='raw-photos',
            Key=key,
            Fields={'Content-Type': content_type},
            Conditions=[['content-length-range', 1, 50_000_000]],  # Max 50MB
            ExpiresIn=300  # 5 minutes
        )
        return {'photo_id': photo_id, 'upload_url': signed_url}

    def confirm_upload(self, photo_id: str, user_id: str, caption: str,
                       hashtags: list[str]) -> dict:
        # Save metadata
        post = {
            'post_id': photo_id,
            'user_id': user_id,
            'caption': caption,
            'hashtags': hashtags,
            'created_at': datetime.utcnow(),
            'status': 'processing',
        }
        posts_db.insert(post)

        # Trigger async processing
        processing_queue.publish({
            'photo_id': photo_id,
            'user_id': user_id,
            's3_key': f"uploads/{user_id}/{photo_id}/original"
        })
        return {'post_id': photo_id, 'status': 'processing'}

class MediaProcessor:
    SIZES = {
        'thumbnail': (150, 150),
        'standard': (1080, 1080),
        'high_res': (2048, 2048),
    }

    def process(self, photo_id: str, s3_key: str):
        original = s3.download(s3_key)
        for size_name, dimensions in self.SIZES.items():
            resized = self._resize(original, dimensions)
            dest_key = f"photos/{photo_id}/{size_name}.jpg"
            s3.upload(dest_key, resized, content_type='image/jpeg')
            cdn.invalidate(dest_key)

        # Mark post as published
        posts_db.update(photo_id, {'status': 'published'})
        # Fanout to followers
        fanout_queue.publish({'post_id': photo_id, 'user_id': user_id})

Feed Generation: Push vs Pull

Feed generation is the hardest design decision. Three approaches:

1. Push on Write (Fanout on Write)

class FanoutService:
    """
    When user A posts, pre-compute and push to all followers' feed caches.
    Pros: Feed reads are O(1) — just read from cache.
    Cons: Celebrities with 100M followers require 100M writes per post.
    """
    def fanout(self, post_id: str, author_id: str):
        followers = social_graph.get_followers(author_id)  # Could be 100M

        # Batch write to Redis sorted sets (score = timestamp)
        BATCH_SIZE = 1000
        for i in range(0, len(followers), BATCH_SIZE):
            batch = followers[i:i+BATCH_SIZE]
            for follower_id in batch:
                feed_cache.zadd(
                    f"feed:{follower_id}",
                    {post_id: time.time()},
                    nx=True  # Don't overwrite existing score
                )
                feed_cache.zremrangebyrank(f"feed:{follower_id}", 0, -1001)  # Keep top 1000

2. Pull on Read (Fanout on Read)

class FeedService:
    """
    When user loads feed, merge timelines from all followed accounts.
    Pros: No fanout cost for celebrities.
    Cons: Feed reads are expensive; latency proportional to # follows.
    """
    def get_feed(self, user_id: str, page: int = 0, limit: int = 20) -> list:
        following = social_graph.get_following(user_id)  # Users this person follows

        # Fetch recent posts from each account in parallel
        all_posts = []
        with ThreadPoolExecutor(max_workers=10) as executor:
            futures = {executor.submit(self._get_posts, uid): uid
                      for uid in following}
            for future in as_completed(futures):
                all_posts.extend(future.result())

        # Sort by timestamp, paginate
        all_posts.sort(key=lambda p: p['created_at'], reverse=True)
        return all_posts[page*limit:(page+1)*limit]

    def _get_posts(self, user_id: str, limit: int = 30) -> list:
        return posts_db.query(
            "SELECT * FROM posts WHERE user_id = ? ORDER BY created_at DESC LIMIT ?",
            [user_id, limit]
        )

3. Hybrid (Instagram’s Actual Approach)

"""
Hybrid strategy used by Instagram:
- Regular users ( ~10K followers): pull on read

Feed construction:
1. Read pre-computed feed from cache (regular users who posted recently)
2. Merge with on-demand pull from celebrity accounts the user follows
3. Apply ranking model (ML-based, not pure chronological)
"""
class HybridFeedService:
    CELEBRITY_THRESHOLD = 10_000

    def get_feed(self, user_id: str) -> list:
        # Read pre-computed feed from Redis
        cached_post_ids = feed_cache.zrevrange(f"feed:{user_id}", 0, 200)

        # Find celebrities the user follows
        following = social_graph.get_following(user_id)
        celebrities = [uid for uid in following
                      if social_graph.get_follower_count(uid) > self.CELEBRITY_THRESHOLD]

        # Pull celebrity posts on-demand
        celeb_posts = []
        for celeb_id in celebrities:
            celeb_posts.extend(posts_db.get_recent_posts(celeb_id, limit=10))

        # Merge and rank
        all_posts = posts_db.get_by_ids(cached_post_ids) + celeb_posts
        return self.ranking_model.rank(all_posts, user_id)

Social Graph Storage

# Follow/unfollow relationship storage
# Two tables for bidirectional lookup

# followers: who follows user X
CREATE TABLE followers (
    user_id    BIGINT,  -- The followed user
    follower_id BIGINT, -- The person following
    created_at TIMESTAMP,
    PRIMARY KEY (user_id, follower_id)
);

# following: who user X follows
CREATE TABLE following (
    user_id     BIGINT,  -- The follower
    followee_id BIGINT,  -- The person being followed
    created_at  TIMESTAMP,
    PRIMARY KEY (user_id, followee_id)
);

# For Instagram's scale: Cassandra with user_id as partition key
# Each partition holds all followers for that user
# Gossip protocol for peer discovery; consistent hashing for data distribution

Data Model

Entity Storage Reason
Photos/Videos S3 + CDN Object storage for large blobs; CDN for global low-latency reads
Post metadata Cassandra High write volume; time-series access pattern (recent posts)
Social graph Cassandra High read throughput; partition by user_id
Feed cache Redis sorted sets O(log n) insert/read; score = timestamp
User profiles PostgreSQL Relational; strong consistency for profile data
Likes/Counts Redis counter + async DB Atomic INCR for counters; async flush to DB
Search index Elasticsearch Full-text search for captions, hashtags, usernames

Interview Discussion Points

  • Feed ranking: Pure chronological → engagement signals (likes, saves, watch time) → ML ranking model (Instagram uses a multi-stage funnel: recall → ranking → reranking)
  • Stories vs posts: Stories expire in 24h (TTL in Redis/Cassandra); different read pattern (pull on open, not in main feed)
  • Global distribution: Multi-region deployment; user data follows user’s home region; CDN edge serves media; feed generation is regional
  • Explore page: Collaborative filtering + content signals; offline ML training; candidate generation + ranking
  • Anti-abuse: Rate limiting on uploads; perceptual hashing (pHash) to detect duplicate/spam content

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does Instagram generate the news feed at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Instagram uses a hybrid fanout strategy. For regular users ( ~10K followers), fanout would be too expensive, so their posts are pulled on-demand when a follower loads their feed and merged with the pre-computed feed. A ranking model then reorders the merged result by engagement signals.”}},{“@type”:”Question”,”name”:”How does Instagram store and serve photos efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Photos are uploaded directly from clients to S3 using pre-signed URLs (bypassing app servers). S3 triggers an async processing pipeline that generates multiple resolutions (thumbnail 150×150, standard 1080×1080, high-res 2048×2048). Processed images are stored back in S3 and served through a CDN. CDN edge nodes cache photos geographically close to users, reducing latency from ~500ms (S3 direct) to ~20-50ms (CDN edge).”}},{“@type”:”Question”,”name”:”What database does Instagram use for the social graph?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Instagram uses Cassandra for the social graph (follower/following relationships). The data model has two tables partitioned by user_id: followers(user_id, follower_id) and following(user_id, followee_id). Cassandra’s wide-row model handles the fan-out efficiently — all followers of a user are in one partition. For extremely popular accounts, the partition can be split across multiple nodes using virtual nodes.”}},{“@type”:”Question”,”name”:”What is the difference between push and pull feed generation?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Push (fanout on write): When user A posts, the system writes to all followers’ feed caches immediately. Feed reads are O(1) but writes are O(followers). Bad for celebrities. Pull (fanout on read): When user B loads their feed, the system fetches recent posts from all accounts B follows and merges them. Feed reads are O(following count) but no write overhead. Bad for users who follow many accounts. Instagram uses hybrid: push for regular users, pull for celebrities.”}}]}

🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

🏢 Asked at: Snap Interview Guide

🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

🏢 Asked at: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

Scroll to Top