System Design Interview: Video Streaming Platform (Netflix/YouTube)

Designing a video streaming platform like Netflix or YouTube is one of the most comprehensive system design challenges, combining video processing pipelines, CDN architecture, adaptive bitrate streaming, and personalization at massive scale.

Core Requirements

Functional: Upload videos, transcode to multiple resolutions/formats, stream to users on any device with adaptive quality, support search and recommendations, track view history and progress.

Non-functional: 200M daily active users. 1B hours of video watched per day. Upload processing within 30 minutes. Stream start time < 2 seconds. 99.99% availability. Support 4K, 1080p, 720p, 480p, 360p.

Video Upload and Processing Pipeline

Upload flow:
  Creator → Upload API → S3 (raw video, presigned PUT URL)
                      → SQS: "video uploaded" event
                      → Video Processing Worker

Processing pipeline stages:
  1. Validation: check video format, duration, file integrity (SHA-256)
  2. Transcoding: convert to multiple resolutions and formats
     Tool: FFmpeg (open source) or cloud services (AWS MediaConvert)
     Output per video:
       360p H.264 MP4  (mobile, low bandwidth)
       480p H.264 MP4
       720p H.264 MP4  (standard HD)
       1080p H.264 MP4
       1080p H.265 HEVC (50% smaller than H.264)
       4K H.265         (if source is 4K)
       Audio-only AAC   (for background play)
  3. Thumbnail generation: extract frames at 10s intervals
  4. Content moderation: run ML classifier (NSFW, copyright)
  5. Packaging: segment into chunks for adaptive streaming (HLS/DASH)
  6. CDN distribution: push to edge PoPs
  7. Update DB: mark video as available, publish event

Distributed transcoding:
  Split 2-hour movie into 5-minute segments
  Transcode each segment in parallel across N workers
  Merge segments → final output
  Reduces 2-hour 4K transcode from 8 hours → 30 minutes

Adaptive Bitrate Streaming (ABR)

Problem: users have different bandwidths; static quality = bad UX
         (buffering on slow connections, blurry on fast connections)

ABR: client switches quality mid-stream based on current bandwidth

HLS (HTTP Live Streaming) — Apple, widely supported:
  Master playlist (m3u8):
    #EXTM3U
    #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
    360p/index.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
    720p/index.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
    1080p/index.m3u8

  Each quality-level playlist (360p/index.m3u8):
    #EXTM3U
    #EXT-X-TARGETDURATION:6
    #EXTINF:6.0,
    segment_001.ts
    #EXTINF:6.0,
    segment_002.ts
    ...

  Client logic:
    - Download master playlist → pick initial quality based on bandwidth estimate
    - Download segments → measure download speed
    - Buffer  20s, high speed: switch to higher quality
    - Target: maintain 20-30s buffer for smooth playback

CDN Architecture

Video storage distribution:
  Origin: S3 (all video segments, all qualities)
  CDN (CloudFront / Akamai / Fastly):
    300+ PoPs worldwide
    Each PoP caches popular video segments
    Cache hit ratio: ~80% (hot content cached at edge)
    Cache miss → origin pull → cache at edge (5-10s penalty)

Cache key: {video_id}/{quality}/{segment_number}.ts
  e.g., cdn.netflix.com/v/xyz123/1080p/segment_042.ts

Popular content (top 10%): pre-warmed at all PoPs
  When video is trending → push-based distribution to edge

Long-tail content: served from nearest PoP or origin on-demand
  First viewer in a region pulls from origin → cached for next viewers

Cache TTL:
  Video segments (immutable): Cache-Control: max-age=31536000 (1 year)
  Playlists (can change): Cache-Control: max-age=5 (live) or 60 (VOD)

Video Storage Architecture

Storage tiers by access frequency:

Hot tier ( 180 days,  2 years, rarely accessed):
  S3 Glacier Deep Archive
  Cost: $0.00099/GB/month, 12-48 hour retrieval

Lifecycle policy (automated):
  Day 0: S3 Standard
  Day 30: → S3-IA
  Day 180: → Glacier
  Day 730: → Glacier Deep Archive

Storage at YouTube scale:
  500 hours of video uploaded per minute
  Avg 1 hour = 2GB raw = 1GB after transcoding (all qualities)
  500 GB/min = 720TB/day = 263PB/year

Database Design

videos table (PostgreSQL/Spanner):
  id            UUID PRIMARY KEY
  title         TEXT
  description   TEXT
  creator_id    BIGINT
  status        ENUM(processing, published, removed)
  duration_sec  INT
  view_count    BIGINT DEFAULT 0
  published_at  TIMESTAMPTZ
  INDEX (creator_id, published_at DESC)
  INDEX (status, published_at DESC)  -- for feed queries

video_qualities table:
  video_id      UUID
  quality       ENUM(360p, 480p, 720p, 1080p, 4K)
  storage_path  TEXT  -- S3 key
  size_bytes    BIGINT
  PRIMARY KEY (video_id, quality)

view_events (ClickHouse / BigQuery — analytics):
  event_id    UUID
  video_id    UUID
  user_id     BIGINT
  started_at  TIMESTAMPTZ
  watch_secs  INT
  quality     TEXT
  device      TEXT
  country     TEXT

user_watch_history (Cassandra — high write rate):
  user_id     BIGINT PARTITION KEY
  video_id    UUID CLUSTERING KEY (descending by watched_at)
  watched_at  TIMESTAMPTZ
  progress_sec INT  -- resume position

View Counter at Scale

Problem: 1B views/day = 11,600 views/sec
         Incrementing DB view_count per view → DB bottleneck

Solution: counter aggregation
  Redis INCR video:views:{video_id}  (atomic, fast)
  Background job every 30s:
    - Flush Redis counters to DB in batch
    - GETSET video:views:{video_id} 0  (atomic read + reset)
    - UPDATE videos SET view_count = view_count + {delta} WHERE id = ?

Approximate counts (YouTube approach):
  Only update DB when count crosses thresholds: 100, 1000, 10000, ...
  Display: "1.2M views" is fine — exact count unimportant

Milestone events:
  On write: check if new count crosses 1M, 10M, 100M milestones
  → Trigger: notify creator, update trending algorithm, badge

Live Streaming Architecture

Live streaming vs VOD (Video on Demand):
  VOD:  entire video pre-processed; segments pre-generated
  Live: real-time encoding; low latency required

Live ingest:
  Creator → RTMP client (OBS, mobile app)
          → RTMP ingest server
          → Transcoder: encode to HLS segments in real-time
          → 2-second segment length (lower = lower latency, more requests)
          → S3 + CDN: segments available ~6 seconds after capture

Ultra-low latency (WebRTC for < 1s delay):
  Used for: interactive live streams, auctions, sports betting
  WebRTC peer-to-peer or SFU (Selective Forwarding Unit) architecture
  See: System Design: WebRTC and Real-Time Video Architecture

Interview Discussion Points

  • Why segment videos into small chunks? Seekability: client jumps to any position by calculating the segment number. Adaptive quality: client switches quality at segment boundaries. Resilience: download one segment at a time; network interruption → just re-download that segment.
  • H.264 vs H.265 trade-off? H.265 (HEVC) produces 50% smaller files at equivalent quality, but requires 4× more CPU to encode. For Netflix: H.265 saves significant CDN costs at scale. Browser support: H.265 not universally supported (Edge yes, Chrome partially) → must serve both formats based on client capabilities.
  • How does Netflix achieve < 2s start time? Pre-buffer: on hover, client fetches the first 5 segments of the most likely quality. Predictive pre-loading: user’s network measured, quality pre-selected before play button is clicked. CDN PoP selection: DNS routes to nearest PoP with the content cached.
  • How to handle concurrent viewers for a viral video? The CDN absorbs the load — each PoP serves its regional audience from local cache. The origin server (S3) only handles CDN miss requests. For truly global events (World Cup), pre-distribute all segments to all PoPs hours before kickoff.

  • LinkedIn Interview Guide
  • Cloudflare Interview Guide
  • Databricks Interview Guide
  • Twitter/X Interview Guide
  • Snap Interview Guide
  • Meta Interview Guide
  • Companies That Ask This

    Scroll to Top