System Design Interview: Design a Video Streaming Platform (YouTube/Netflix)

What Is a Video Streaming Platform?

A video streaming platform stores, processes, and delivers video content to millions of concurrent viewers. Examples: YouTube (500 hours of video uploaded per minute), Netflix (200M+ subscribers), Twitch (live streaming). Core challenges: video transcoding at scale, adaptive bitrate streaming, CDN delivery, and minimizing startup latency and buffering.

  • Twitter Interview Guide
  • Snap Interview Guide
  • Databricks Interview Guide
  • Cloudflare Interview Guide
  • Meta Interview Guide
  • Netflix Interview Guide
  • System Requirements

    Functional

    • Upload video: ingest raw video, transcode to multiple resolutions
    • Stream video: adaptive bitrate based on network conditions
    • Search and browse video catalog
    • Track view counts, watch history, recommendations

    Non-Functional

    • 500 hours uploaded per minute; 1B daily views
    • Startup latency <2 seconds globally
    • Seamless quality adaptation during playback

    Upload and Transcoding Pipeline

    User upload ──► Upload Service ──► Raw video in S3
                                            │
                                   Transcoding Queue (SQS)
                                            │
                                 Transcoding Workers (FFmpeg)
                               ┌────────────┴──────────────┐
                               ▼                           ▼
                      Multiple renditions:          Thumbnail extraction
                      1080p, 720p, 480p,            (sample frames)
                      360p, 240p in HLS/DASH
                               │
                        CDN origin (S3/GCS)
    

    Transcoding is CPU-intensive. A 10-minute 4K video takes ~5 minutes to transcode on a single core. Parallelize: split video into 1-minute segments, transcode segments in parallel across workers, reassemble. Spot instances for cost efficiency. Store each rendition as HLS (HTTP Live Streaming) segments: 2-second .ts chunks + a .m3u8 manifest file listing all chunks.

    Adaptive Bitrate Streaming (ABR)

    The video player downloads a master manifest (.m3u8) listing available quality levels. The player measures download bandwidth for each 2-second chunk. If download is fast (bandwidth > bitrate): switch to higher quality next chunk. If download is slow: switch to lower quality. This happens automatically, seamlessly, mid-stream. The user gets the highest quality their connection supports without buffering.

    # Master manifest (m3u8)
    #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
    1080p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
    720p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
    480p/playlist.m3u8
    

    CDN Architecture

    Video segments are large and cacheable. 95%+ of traffic is served from CDN edge nodes. Upload-to-CDN pipeline: after transcoding, push segments to the CDN origin (S3 bucket). CDN edge nodes (Cloudflare, Akamai, AWS CloudFront) cache segments at PoPs globally. First viewer in a region misses the cache (cold start); all subsequent viewers hit the edge cache. For popular videos: CDN cache hit rate approaches 100%. The origin (S3) only handles the first viewer per edge node.

    Video Metadata Service

    videos: id, creator_id, title, description, duration, status,
            thumbnail_url, created_at, view_count
    video_renditions: video_id, resolution, bitrate, manifest_url, size_bytes
    

    Store metadata in a relational DB (PostgreSQL). view_count updated asynchronously via a Kafka consumer — do not update on every view request (too much write amplification). Batch increment view counts every 60 seconds.

    Resumable Uploads

    Large video files (1GB+) need resumable uploads to handle network interruptions. Protocol: initialize an upload session, get a session URL. Upload in 5MB chunks with byte range headers. Server tracks the last acknowledged byte. On network failure: resume from the last byte. This is the protocol used by YouTube Data API and GCS resumable uploads.

    Recommendations

    Two-stage pipeline: candidate retrieval (collaborative filtering: users who watched this also watched X) → ranking (ML model scoring candidates by predicted watch probability, weighted by recency and diversity). Store user watch history in Cassandra (write-heavy, time-series). Train recommendation models offline (daily batch), serve from a feature store with real-time features (what did the user watch in the last hour).

    Live Streaming Differences

    Live streaming adds latency constraints: HLS has 15-30 second latency (segment duration buffering). Low-latency HLS (LLHLS): 2-3 seconds. WebRTC: sub-second. Live segments are not cached aggressively — they expire in seconds. The ingest path: streamer → RTMP → ingest server → transcode on the fly → push to CDN → viewers.

    Interview Tips

    • HLS segments + CDN is the core architecture — describe it early.
    • Transcoding parallelism (split into segments) shows depth.
    • ABR is a client-side algorithm — the server just provides multiple renditions.
    • view_count batching via Kafka avoids DB write amplification.
    Scroll to Top