System Design Interview: Design YouTube / Video Streaming Platform
Designing YouTube is one of the most comprehensive system design problems — it covers video upload pipelines, adaptive bitrate streaming, CDN architecture, recommendation systems, and massive scale storage. It’s frequently asked at Netflix, YouTube/Google, Meta, and Twitch.
Functional Requirements
- Upload videos (any format, up to several hours long)
- Stream videos to users globally with adaptive quality
- Search videos by title, description, and tags
- Like, comment, and subscribe to channels
- Personalized video recommendations
- Track view counts and watch history
Non-Functional Requirements
- Scale: 500 hours of video uploaded per minute; 1B+ hours watched per day
- Storage: ~500hr/min × 60min × 1GB/hr = ~30TB raw video per hour
- Read-heavy: upload:watch ratio ≈ 1:200
- Low latency streaming: <2s startup time; buffer-free playback
- Global availability: serve users in 190+ countries
Capacity Estimation
- Uploads: 500 hours/min = 8.3 hours/sec → ~30GB/sec raw video ingestion
- Processed storage: raw × 5 resolutions × 0.3 compression ≈ 45GB per upload-hour
- View QPS: 1B hours/day ÷ 86,400 = ~11,600 concurrent streams; peak ~3×
- Metadata: 800M videos × 1KB = 800GB for video metadata
High-Level Architecture
[Upload Client]
│ (chunked multipart)
▼
[Upload Service] ──► [Raw Video Storage (S3)]
│ │
│ [Processing Queue (SQS)]
│ │
│ [Video Processing Workers]
│ (FFmpeg: transcode to
│ 360p, 480p, 720p, 1080p, 4K)
│ │
│ [Processed Video Storage (S3)]
│ │
│ [CDN (CloudFront/Akamai)]
│ │
[Metadata DB] [Streaming Clients]
(video title, (HLS/DASH adaptive
duration, tags) bitrate playback)
Video Upload Pipeline
class VideoUploadService:
"""
Chunked upload to handle large files reliably.
Client splits video into 5-10MB chunks, uploads each independently.
Server reassembles and hands off to processing pipeline.
"""
def initiate_upload(self, user_id: str, filename: str,
total_size: int) -> dict:
video_id = generate_snowflake_id()
upload_id = s3.create_multipart_upload(
Bucket='raw-videos',
Key=f"{video_id}/original"
)['UploadId']
# Store upload session
upload_sessions.set(upload_id, {
'video_id': video_id,
'user_id': user_id,
'filename': filename,
'total_size': total_size,
'uploaded_chunks': {},
'status': 'uploading'
}, ttl=86400) # 24h to complete
return {'video_id': video_id, 'upload_id': upload_id}
def upload_chunk(self, upload_id: str, chunk_number: int,
chunk_data: bytes) -> dict:
session = upload_sessions.get(upload_id)
video_id = session['video_id']
response = s3.upload_part(
Bucket='raw-videos',
Key=f"{video_id}/original",
UploadId=upload_id,
PartNumber=chunk_number,
Body=chunk_data
)
session['uploaded_chunks'][chunk_number] = response['ETag']
upload_sessions.set(upload_id, session)
return {'chunk': chunk_number, 'etag': response['ETag']}
def complete_upload(self, upload_id: str) -> dict:
session = upload_sessions.get(upload_id)
video_id = session['video_id']
# Complete S3 multipart upload
parts = [{'PartNumber': n, 'ETag': etag}
for n, etag in sorted(session['uploaded_chunks'].items())]
s3.complete_multipart_upload(
Bucket='raw-videos',
Key=f"{video_id}/original",
UploadId=upload_id,
MultipartUpload={'Parts': parts}
)
# Queue for processing
processing_queue.publish({
'video_id': video_id,
'user_id': session['user_id'],
's3_key': f"{video_id}/original"
})
return {'video_id': video_id, 'status': 'processing'}
Video Processing Pipeline
class VideoProcessor:
"""
Transcodes raw video to multiple resolutions using FFmpeg.
Runs as distributed workers consuming from processing queue.
"""
RESOLUTIONS = [
('360p', 640, 360, 500_000), # 500 Kbps
('480p', 854, 480, 1_000_000), # 1 Mbps
('720p', 1280, 720, 2_500_000), # 2.5 Mbps
('1080p', 1920, 1080,5_000_000), # 5 Mbps
('4k', 3840, 2160,15_000_000),# 15 Mbps
]
def process(self, video_id: str, s3_key: str):
# Download from S3
local_path = self._download(s3_key)
# Extract metadata (duration, fps, codec)
metadata = self._probe(local_path)
# Transcode each resolution in parallel
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [
executor.submit(self._transcode, local_path, video_id, res)
for res in self.RESOLUTIONS
if self._should_include(metadata, res)
]
results = [f.result() for f in futures]
# Generate HLS playlist (master + per-resolution)
self._generate_hls_manifest(video_id, results)
# Generate thumbnail from frame at 10% duration
self._extract_thumbnail(local_path, video_id, metadata['duration'])
# Update video status to published
videos_db.update(video_id, {
'status': 'published',
'duration': metadata['duration'],
'resolutions': [r[0] for r in self.RESOLUTIONS],
'thumbnail_url': f"https://cdn.example.com/thumbs/{video_id}.jpg"
})
def _transcode(self, input_path: str, video_id: str,
resolution: tuple) -> str:
name, width, height, bitrate = resolution
output_key = f"processed/{video_id}/{name}.m3u8"
# FFmpeg command (simplified)
cmd = [
'ffmpeg', '-i', input_path,
'-vf', f'scale={width}:{height}',
'-b:v', str(bitrate),
'-codec:a', 'aac', '-b:a', '128k',
'-hls_time', '6', # 6-second segments
'-hls_playlist_type', 'vod',
f'/tmp/{video_id}_{name}.m3u8'
]
subprocess.run(cmd, check=True)
s3.upload(output_key, f'/tmp/{video_id}_{name}.m3u8')
return output_key
def _generate_hls_manifest(self, video_id: str, segment_urls: list):
"""Generate master HLS playlist for adaptive bitrate."""
master = "#EXTM3U
#EXT-X-VERSION:3
"
bandwidth_map = {
'360p': 500000, '480p': 1000000,
'720p': 2500000, '1080p': 5000000, '4k': 15000000
}
for url in segment_urls:
res = url.split('/')[-1].replace('.m3u8', '')
bw = bandwidth_map.get(res, 1000000)
master += f"#EXT-X-STREAM-INF:BANDWIDTH={bw}
{url}
"
s3.upload(f"processed/{video_id}/master.m3u8", master)
def _should_include(self, metadata: dict, resolution: tuple) -> bool:
"""Don't upscale: skip resolutions higher than source."""
_, _, height, _ = resolution
return metadata.get('height', 0) >= height
Adaptive Bitrate Streaming (ABR)
YouTube uses DASH (Dynamic Adaptive Streaming over HTTP); Netflix uses HLS. Both work the same way:
- Video is split into short segments (2-10 seconds each)
- Each segment is encoded at multiple bitrates
- Client player monitors bandwidth and switches quality per-segment
- Master playlist tells the client which URL to use for each quality level
class ABRPlayer:
"""Client-side adaptive bitrate logic (conceptual)."""
def __init__(self):
self.buffer_size = 0 # Seconds of video buffered
self.current_quality = '480p'
self.download_speeds = [] # Last N segment download speeds
def select_quality(self) -> str:
avg_speed = sum(self.download_speeds[-5:]) / max(len(self.download_speeds), 1)
# Map bandwidth to quality tier
if avg_speed > 10_000_000: return '4k'
elif avg_speed > 4_000_000: return '1080p'
elif avg_speed > 2_000_000: return '720p'
elif avg_speed > 800_000: return '480p'
else: return '360p'
def should_buffer(self) -> bool:
return self.buffer_size < 15 # Buffer 15 seconds ahead
CDN Strategy
YouTube’s CDN (Google Global Cache) uses a two-tier approach:
- ISP-embedded caches: Google negotiates with ISPs to place cache servers inside their networks. Popular videos are pre-seeded. Cache hit rate >95% for top content.
- Edge PoPs: 130+ points of presence worldwide. Cache misses fetch from origin (S3) and populate the edge.
- Cache key:
video_id + resolution + segment_number— segment-level granularity allows efficient caching even if user seeks to a non-start position.
Metadata and Search
# Video metadata: PostgreSQL or MySQL (small-medium scale)
CREATE TABLE videos (
video_id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
title VARCHAR(500),
description TEXT,
duration INT, -- seconds
view_count BIGINT DEFAULT 0,
like_count BIGINT DEFAULT 0,
status VARCHAR(20), -- processing, published, removed
created_at TIMESTAMP,
INDEX (user_id),
INDEX (created_at)
);
# Search: Elasticsearch index
# Documents: {video_id, title, description, tags, channel_name}
# Query: multi-match across title (boost: 3), tags (boost: 2), description
# Ranking: BM25 score × recency_boost × view_count_boost
View Count Scalability
"""
Naive: UPDATE videos SET view_count = view_count + 1 WHERE video_id = ?
Problem: 11,600 concurrent streams × 1 update/second = 11,600 writes/sec
→ database bottleneck
Solution: Count-min sketch + Redis atomic counters
"""
class ViewCountService:
BATCH_FLUSH_INTERVAL = 60 # Seconds
BATCH_SIZE_THRESHOLD = 10_000
def record_view(self, video_id: str, user_id: str):
# Deduplicate: don't count re-watches within 24h
key = f"view:{user_id}:{video_id}"
if redis.set(key, 1, nx=True, ex=86400): # NX = only if not exists
# Increment atomic counter in Redis
redis.incr(f"view_count:{video_id}")
# Add to batch queue for DB flush
redis.rpush("view_count_batch", f"{video_id}:{int(time.time())}")
def flush_to_db(self):
"""Runs periodically (every 60s) to batch-update the DB."""
counts = {}
batch_size = redis.llen("view_count_batch")
for _ in range(min(batch_size, self.BATCH_SIZE_THRESHOLD)):
entry = redis.lpop("view_count_batch")
video_id = entry.split(':')[0]
counts[video_id] = counts.get(video_id, 0) + 1
# Batch update
for video_id, count in counts.items():
db.execute(
"UPDATE videos SET view_count = view_count + %s WHERE video_id = %s",
[count, video_id]
)
Recommendation System (Simplified)
- Candidate generation: Collaborative filtering (users with similar watch history); content-based (same channel, tags, topic)
- Ranking: ML model trained on engagement signals (click-through rate, watch percentage, likes, shares)
- Two-stage funnel: Retrieve thousands of candidates → rank top 100 → present top 10 with diversity constraints
- Offline training: Daily batch jobs on BigQuery/Spark; model pushed to serving infrastructure
Interview Discussion Points
- Resume upload: Track uploaded chunks server-side; client queries which chunks are missing and resumes from there
- Geographic compliance: Geo-block at CDN level using viewer IP; serve different content per region for licensing
- Copyright detection: Content ID — fingerprint every uploaded video; scan against database of copyrighted content using audio/video fingerprinting (perceptual hash)
- Hot vs cold storage: Popular videos on fast SSDs/CDN; videos with zero views for 6 months moved to Glacier (archival storage)
- Live streaming: Different pipeline — RTMP ingest → real-time segmentation → low-latency HLS (LL-HLS with 2-second segments)
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does YouTube store and serve videos at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Videos are uploaded in chunks directly to object storage (S3/GCS) using pre-signed URLs. An async processing pipeline (workers consuming from a queue) transcodes each video to multiple resolutions (360p through 4K) using FFmpeg. Processed segments are stored back in object storage and served through a global CDN. CDN edge nodes cache popular video segments close to viewers, reducing latency to 20-50ms vs 500ms+ from origin.”}},{“@type”:”Question”,”name”:”What is adaptive bitrate streaming and how does YouTube use it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Adaptive Bitrate (ABR) streaming splits video into short segments (typically 2-10 seconds) encoded at multiple bitrates. The client player monitors available bandwidth and selects the appropriate quality for each segment independently. YouTube uses DASH (Dynamic Adaptive Streaming over HTTP), Netflix uses HLS. The master playlist lists all available quality URLs; the player automatically switches quality based on download speed, eliminating buffering on network changes.”}},{“@type”:”Question”,”name”:”How does YouTube handle video view counts at massive scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Naive DB increments (UPDATE videos SET view_count = view_count + 1) would bottleneck at ~11,600 writes/second at YouTube scale. Instead: Redis atomic INCR for real-time counting, with 24-hour deduplication per user-video pair using Redis NX (set-if-not-exists). A background job batch-flushes accumulated counts to the database every 60 seconds. This reduces DB writes by 3-4 orders of magnitude while keeping displayed counts near-real-time.”}},{“@type”:”Question”,”name”:”How does YouTube handle chunked video uploads for large files?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The client splits large videos into 5-10MB chunks using S3 Multipart Upload. Each chunk is uploaded independently with a part number; the server tracks which parts are received. On network failure, the client only re-uploads failed chunks, not the entire file. When all chunks arrive, the server calls CompleteMultipartUpload to reassemble, then queues the video for transcoding. A 24-hour session TTL prevents orphaned uploads from accumulating.”}}]}
🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence
🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
🏢 Asked at: Cloudflare Interview Guide 2026: Networking, Edge Computing, and CDN Design
🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale
🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
🏢 Asked at: Snap Interview Guide