System Design Interview: Design a Video Streaming Platform (YouTube/Netflix)

What Is a Video Streaming Platform?

A video streaming platform stores, processes, and delivers video content to millions of concurrent viewers. Examples: YouTube (500 hours of video uploaded per minute), Netflix (200M+ subscribers), Twitch (live streaming). Core challenges: video transcoding at scale, adaptive bitrate streaming, CDN delivery, and minimizing startup latency and buffering.

  • Twitter Interview Guide
  • Snap Interview Guide
  • Databricks Interview Guide
  • Cloudflare Interview Guide
  • Meta Interview Guide
  • Netflix Interview Guide
  • System Requirements

    Functional

    • Upload video: ingest raw video, transcode to multiple resolutions
    • Stream video: adaptive bitrate based on network conditions
    • Search and browse video catalog
    • Track view counts, watch history, recommendations

    Non-Functional

    • 500 hours uploaded per minute; 1B daily views
    • Startup latency <2 seconds globally
    • Seamless quality adaptation during playback

    Upload and Transcoding Pipeline

    User upload ──► Upload Service ──► Raw video in S3
                                            │
                                   Transcoding Queue (SQS)
                                            │
                                 Transcoding Workers (FFmpeg)
                               ┌────────────┴──────────────┐
                               ▼                           ▼
                      Multiple renditions:          Thumbnail extraction
                      1080p, 720p, 480p,            (sample frames)
                      360p, 240p in HLS/DASH
                               │
                        CDN origin (S3/GCS)
    

    Transcoding is CPU-intensive. A 10-minute 4K video takes ~5 minutes to transcode on a single core. Parallelize: split video into 1-minute segments, transcode segments in parallel across workers, reassemble. Spot instances for cost efficiency. Store each rendition as HLS (HTTP Live Streaming) segments: 2-second .ts chunks + a .m3u8 manifest file listing all chunks.

    Adaptive Bitrate Streaming (ABR)

    The video player downloads a master manifest (.m3u8) listing available quality levels. The player measures download bandwidth for each 2-second chunk. If download is fast (bandwidth > bitrate): switch to higher quality next chunk. If download is slow: switch to lower quality. This happens automatically, seamlessly, mid-stream. The user gets the highest quality their connection supports without buffering.

    # Master manifest (m3u8)
    #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
    1080p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
    720p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
    480p/playlist.m3u8
    

    CDN Architecture

    Video segments are large and cacheable. 95%+ of traffic is served from CDN edge nodes. Upload-to-CDN pipeline: after transcoding, push segments to the CDN origin (S3 bucket). CDN edge nodes (Cloudflare, Akamai, AWS CloudFront) cache segments at PoPs globally. First viewer in a region misses the cache (cold start); all subsequent viewers hit the edge cache. For popular videos: CDN cache hit rate approaches 100%. The origin (S3) only handles the first viewer per edge node.

    Video Metadata Service

    videos: id, creator_id, title, description, duration, status,
            thumbnail_url, created_at, view_count
    video_renditions: video_id, resolution, bitrate, manifest_url, size_bytes
    

    Store metadata in a relational DB (PostgreSQL). view_count updated asynchronously via a Kafka consumer — do not update on every view request (too much write amplification). Batch increment view counts every 60 seconds.

    Resumable Uploads

    Large video files (1GB+) need resumable uploads to handle network interruptions. Protocol: initialize an upload session, get a session URL. Upload in 5MB chunks with byte range headers. Server tracks the last acknowledged byte. On network failure: resume from the last byte. This is the protocol used by YouTube Data API and GCS resumable uploads.

    Recommendations

    Two-stage pipeline: candidate retrieval (collaborative filtering: users who watched this also watched X) → ranking (ML model scoring candidates by predicted watch probability, weighted by recency and diversity). Store user watch history in Cassandra (write-heavy, time-series). Train recommendation models offline (daily batch), serve from a feature store with real-time features (what did the user watch in the last hour).

    Live Streaming Differences

    Live streaming adds latency constraints: HLS has 15-30 second latency (segment duration buffering). Low-latency HLS (LLHLS): 2-3 seconds. WebRTC: sub-second. Live segments are not cached aggressively — they expire in seconds. The ingest path: streamer → RTMP → ingest server → transcode on the fly → push to CDN → viewers.

    Interview Tips

    • HLS segments + CDN is the core architecture — describe it early.
    • Transcoding parallelism (split into segments) shows depth.
    • ABR is a client-side algorithm — the server just provides multiple renditions.
    • view_count batching via Kafka avoids DB write amplification.

    {
    “@context”: “https://schema.org”,
    “@type”: “FAQPage”,
    “mainEntity”: [
    {
    “@type”: “Question”,
    “name”: “How does adaptive bitrate streaming (HLS) work and why does it prevent buffering?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “HLS (HTTP Live Streaming) divides a video into small segments (2-4 seconds each) encoded at multiple bitrates (e.g., 240p at 400Kbps, 720p at 2.5Mbps, 1080p at 5Mbps). The player downloads a master manifest listing all available quality levels. It then selects a quality playlist and begins downloading segments sequentially over plain HTTP. After each segment download, the player measures the actual download throughput. If throughput exceeds the current bitrate comfortably: switch to a higher quality level for the next segment. If throughput drops: switch to a lower quality. This segment-level adaptation means quality changes happen every 2-4 seconds — never mid-frame. Why it prevents buffering: the player maintains a buffer (e.g., 30 seconds of video ahead). If it detects bandwidth dropping, it switches to a lower bitrate before the buffer drains. The buffer acts as a shock absorber. Contrast with progressive download (single file): no quality adaptation, buffers completely on bandwidth drops. HLS and DASH (used by YouTube, Netflix) enable smooth streaming on variable connections.” }
    },
    {
    “@type”: “Question”,
    “name”: “How do you design the video transcoding pipeline to handle 500 hours uploaded per minute?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “At 500 hours/minute, assuming average 30-minute videos at 1GB raw: 1000 videos/minute = 17 uploads/second. Each video needs 5 renditions (1080p, 720p, 480p, 360p, 240p). Single-threaded transcoding: a 30-minute 1080p video takes ~30 minutes to transcode — throughput of 1 video/30min per worker. Need 1000*30 = 30,000 worker-minutes to keep up. Solution: (1) Split each video into 1-minute segments (30 segments per video). Transcode all segments in parallel across workers. Reassemble into one HLS playlist. (2) Auto-scaling transcoding workers (EC2 Spot instances: 70% cheaper). (3) Priority queue: shorter videos are transcoded first (better user experience), longer videos queue. (4) Tiered transcoding: immediately transcode 360p (lowest quality, fastest), so the video is viewable within 2 minutes. Higher quality renditions follow. (5) Dedicated codec acceleration: GPU-accelerated encoding (NVENC for H.264/H.265) is 5-10x faster than CPU. At scale, this pipeline keeps per-video transcoding lag under 5 minutes even at 500 hours/minute.” }
    },
    {
    “@type”: “Question”,
    “name”: “How does a CDN serve video at global scale and what happens on a cache miss?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “A CDN (Content Delivery Network) has Points of Presence (PoPs) in 200+ cities globally. Each PoP caches video segments. When a viewer in Tokyo requests a video segment: DNS resolves to the nearest Tokyo CDN edge. Edge checks its cache: if hit, serves directly (sub-millisecond, no origin request). If miss: edge fetches the segment from the origin (S3 bucket in us-east-1), caches it, and serves it to the viewer. Subsequent Tokyo viewers get the cached version. Cache hit rate for popular videos: 95%+. The origin (S3) only handles the first viewer per edge per segment. For a viral video with 1M concurrent viewers spread across 50 PoPs: S3 handles ~50 requests (one cold start per PoP per segment). Each 4-second segment requires one cache fill — 50 fills * 5MB/segment = 250MB to S3. CDN handles the other 999,950 requests. CDN cost model: charge per GB delivered from edge (cheap) + per GB transferred from origin to CDN (more expensive). Maximizing cache hit rate minimizes origin cost. For 4K videos too large for all PoPs: use tiered CDN caching (regional cache above local edge caches).” }
    }
    ]
    }

    Scroll to Top