Designing a video streaming platform like Netflix or YouTube is one of the most comprehensive system design challenges, combining video processing pipelines, CDN architecture, adaptive bitrate streaming, and personalization at massive scale.
Core Requirements
Functional: Upload videos, transcode to multiple resolutions/formats, stream to users on any device with adaptive quality, support search and recommendations, track view history and progress.
Non-functional: 200M daily active users. 1B hours of video watched per day. Upload processing within 30 minutes. Stream start time < 2 seconds. 99.99% availability. Support 4K, 1080p, 720p, 480p, 360p.
Video Upload and Processing Pipeline
Upload flow:
Creator → Upload API → S3 (raw video, presigned PUT URL)
→ SQS: "video uploaded" event
→ Video Processing Worker
Processing pipeline stages:
1. Validation: check video format, duration, file integrity (SHA-256)
2. Transcoding: convert to multiple resolutions and formats
Tool: FFmpeg (open source) or cloud services (AWS MediaConvert)
Output per video:
360p H.264 MP4 (mobile, low bandwidth)
480p H.264 MP4
720p H.264 MP4 (standard HD)
1080p H.264 MP4
1080p H.265 HEVC (50% smaller than H.264)
4K H.265 (if source is 4K)
Audio-only AAC (for background play)
3. Thumbnail generation: extract frames at 10s intervals
4. Content moderation: run ML classifier (NSFW, copyright)
5. Packaging: segment into chunks for adaptive streaming (HLS/DASH)
6. CDN distribution: push to edge PoPs
7. Update DB: mark video as available, publish event
Distributed transcoding:
Split 2-hour movie into 5-minute segments
Transcode each segment in parallel across N workers
Merge segments → final output
Reduces 2-hour 4K transcode from 8 hours → 30 minutes
Adaptive Bitrate Streaming (ABR)
Problem: users have different bandwidths; static quality = bad UX
(buffering on slow connections, blurry on fast connections)
ABR: client switches quality mid-stream based on current bandwidth
HLS (HTTP Live Streaming) — Apple, widely supported:
Master playlist (m3u8):
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8
Each quality-level playlist (360p/index.m3u8):
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXTINF:6.0,
segment_001.ts
#EXTINF:6.0,
segment_002.ts
...
Client logic:
- Download master playlist → pick initial quality based on bandwidth estimate
- Download segments → measure download speed
- Buffer 20s, high speed: switch to higher quality
- Target: maintain 20-30s buffer for smooth playback
CDN Architecture
Video storage distribution:
Origin: S3 (all video segments, all qualities)
CDN (CloudFront / Akamai / Fastly):
300+ PoPs worldwide
Each PoP caches popular video segments
Cache hit ratio: ~80% (hot content cached at edge)
Cache miss → origin pull → cache at edge (5-10s penalty)
Cache key: {video_id}/{quality}/{segment_number}.ts
e.g., cdn.netflix.com/v/xyz123/1080p/segment_042.ts
Popular content (top 10%): pre-warmed at all PoPs
When video is trending → push-based distribution to edge
Long-tail content: served from nearest PoP or origin on-demand
First viewer in a region pulls from origin → cached for next viewers
Cache TTL:
Video segments (immutable): Cache-Control: max-age=31536000 (1 year)
Playlists (can change): Cache-Control: max-age=5 (live) or 60 (VOD)
Video Storage Architecture
Storage tiers by access frequency:
Hot tier ( 180 days, 2 years, rarely accessed):
S3 Glacier Deep Archive
Cost: $0.00099/GB/month, 12-48 hour retrieval
Lifecycle policy (automated):
Day 0: S3 Standard
Day 30: → S3-IA
Day 180: → Glacier
Day 730: → Glacier Deep Archive
Storage at YouTube scale:
500 hours of video uploaded per minute
Avg 1 hour = 2GB raw = 1GB after transcoding (all qualities)
500 GB/min = 720TB/day = 263PB/year
Database Design
videos table (PostgreSQL/Spanner):
id UUID PRIMARY KEY
title TEXT
description TEXT
creator_id BIGINT
status ENUM(processing, published, removed)
duration_sec INT
view_count BIGINT DEFAULT 0
published_at TIMESTAMPTZ
INDEX (creator_id, published_at DESC)
INDEX (status, published_at DESC) -- for feed queries
video_qualities table:
video_id UUID
quality ENUM(360p, 480p, 720p, 1080p, 4K)
storage_path TEXT -- S3 key
size_bytes BIGINT
PRIMARY KEY (video_id, quality)
view_events (ClickHouse / BigQuery — analytics):
event_id UUID
video_id UUID
user_id BIGINT
started_at TIMESTAMPTZ
watch_secs INT
quality TEXT
device TEXT
country TEXT
user_watch_history (Cassandra — high write rate):
user_id BIGINT PARTITION KEY
video_id UUID CLUSTERING KEY (descending by watched_at)
watched_at TIMESTAMPTZ
progress_sec INT -- resume position
View Counter at Scale
Problem: 1B views/day = 11,600 views/sec
Incrementing DB view_count per view → DB bottleneck
Solution: counter aggregation
Redis INCR video:views:{video_id} (atomic, fast)
Background job every 30s:
- Flush Redis counters to DB in batch
- GETSET video:views:{video_id} 0 (atomic read + reset)
- UPDATE videos SET view_count = view_count + {delta} WHERE id = ?
Approximate counts (YouTube approach):
Only update DB when count crosses thresholds: 100, 1000, 10000, ...
Display: "1.2M views" is fine — exact count unimportant
Milestone events:
On write: check if new count crosses 1M, 10M, 100M milestones
→ Trigger: notify creator, update trending algorithm, badge
Live Streaming Architecture
Live streaming vs VOD (Video on Demand):
VOD: entire video pre-processed; segments pre-generated
Live: real-time encoding; low latency required
Live ingest:
Creator → RTMP client (OBS, mobile app)
→ RTMP ingest server
→ Transcoder: encode to HLS segments in real-time
→ 2-second segment length (lower = lower latency, more requests)
→ S3 + CDN: segments available ~6 seconds after capture
Ultra-low latency (WebRTC for < 1s delay):
Used for: interactive live streams, auctions, sports betting
WebRTC peer-to-peer or SFU (Selective Forwarding Unit) architecture
See: System Design: WebRTC and Real-Time Video Architecture
Interview Discussion Points
- Why segment videos into small chunks? Seekability: client jumps to any position by calculating the segment number. Adaptive quality: client switches quality at segment boundaries. Resilience: download one segment at a time; network interruption → just re-download that segment.
- H.264 vs H.265 trade-off? H.265 (HEVC) produces 50% smaller files at equivalent quality, but requires 4× more CPU to encode. For Netflix: H.265 saves significant CDN costs at scale. Browser support: H.265 not universally supported (Edge yes, Chrome partially) → must serve both formats based on client capabilities.
- How does Netflix achieve < 2s start time? Pre-buffer: on hover, client fetches the first 5 segments of the most likely quality. Predictive pre-loading: user’s network measured, quality pre-selected before play button is clicked. CDN PoP selection: DNS routes to nearest PoP with the content cached.
- How to handle concurrent viewers for a viral video? The CDN absorbs the load — each PoP serves its regional audience from local cache. The origin server (S3) only handles CDN miss requests. For truly global events (World Cup), pre-distribute all segments to all PoPs hours before kickoff.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does adaptive bitrate streaming (ABR) work in video platforms like Netflix?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Adaptive bitrate streaming (ABR) pre-encodes each video at multiple quality levels (360p, 720p, 1080p, 4K) and splits each quality into small 2-6 second segments. A manifest file (HLS .m3u8 or DASH .mpd) lists all available quality levels. The player downloads a few segments at the current quality, measures download speed, and continuously estimates available bandwidth. If the buffer falls below a threshold (e.g., < 5 seconds), it switches to a lower quality. If the buffer is healthy and bandwidth is high, it steps up quality. The player makes switching decisions at each segment boundary, so quality changes happen seamlessly between segments. The goal is to keep a 20-30 second buffer at the highest sustainable quality."
}
},
{
"@type": "Question",
"name": "How does a video platform transcode videos efficiently at scale?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Transcoding is compute-intensive and parallelized using a split-merge approach. A 2-hour video is split into 5-minute segments immediately after upload. Each segment is transcoded independently in parallel across multiple workers (FFmpeg on GPU instances), producing all quality variants simultaneously. When all segment transcodes complete, the results are merged into the final output. This reduces a 2-hour 4K transcode from 8+ hours (serial) to 30 minutes (parallel). AWS Elastic Transcoder, AWS MediaConvert, or a custom worker fleet on spot instances handles this. Each video results in 5-7 quality variants u00d7 hundreds of 5-minute segments = thousands of objects stored in S3."
}
},
{
"@type": "Question",
"name": "How does Netflix achieve a video start time of under 2 seconds?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Netflix achieves fast start through several techniques: predictive pre-buffering starts downloading the first few segments of the most likely next video before the user clicks play (based on viewing patterns and hover events). Client-side bandwidth estimation runs continuously, so quality selection is instant when play starts. Geographic CDN placement ensures video segments are cached at an edge PoP within milliseconds of the user's location u2014 300+ PoPs worldwide mean < 20ms network RTT to a cached segment. For the most popular content, segments are pre-pushed to all CDN edge nodes proactively. The HLS manifest is small (< 5KB) and fetched first, then segment downloads begin immediately u2014 parallelized HTTP/2 multiplexing downloads multiple segments simultaneously."
}
}
]
}