System Design Interview: Design YouTube / Video Streaming Platform
Designing YouTube is one of the most comprehensive system design problems — it covers video upload pipelines, adaptive bitrate streaming, CDN architecture, recommendation systems, and massive scale storage. It’s frequently asked at Netflix, YouTube/Google, Meta, and Twitch.
Functional Requirements
- Upload videos (any format, up to several hours long)
- Stream videos to users globally with adaptive quality
- Search videos by title, description, and tags
- Like, comment, and subscribe to channels
- Personalized video recommendations
- Track view counts and watch history
Non-Functional Requirements
- Scale: 500 hours of video uploaded per minute; 1B+ hours watched per day
- Storage: ~500hr/min × 60min × 1GB/hr = ~30TB raw video per hour
- Read-heavy: upload:watch ratio ≈ 1:200
- Low latency streaming: <2s startup time; buffer-free playback
- Global availability: serve users in 190+ countries
Capacity Estimation
- Uploads: 500 hours/min = 8.3 hours/sec → ~30GB/sec raw video ingestion
- Processed storage: raw × 5 resolutions × 0.3 compression ≈ 45GB per upload-hour
- View QPS: 1B hours/day ÷ 86,400 = ~11,600 concurrent streams; peak ~3×
- Metadata: 800M videos × 1KB = 800GB for video metadata
High-Level Architecture
[Upload Client]
│ (chunked multipart)
▼
[Upload Service] ──► [Raw Video Storage (S3)]
│ │
│ [Processing Queue (SQS)]
│ │
│ [Video Processing Workers]
│ (FFmpeg: transcode to
│ 360p, 480p, 720p, 1080p, 4K)
│ │
│ [Processed Video Storage (S3)]
│ │
│ [CDN (CloudFront/Akamai)]
│ │
[Metadata DB] [Streaming Clients]
(video title, (HLS/DASH adaptive
duration, tags) bitrate playback)
Video Upload Pipeline
class VideoUploadService:
"""
Chunked upload to handle large files reliably.
Client splits video into 5-10MB chunks, uploads each independently.
Server reassembles and hands off to processing pipeline.
"""
def initiate_upload(self, user_id: str, filename: str,
total_size: int) -> dict:
video_id = generate_snowflake_id()
upload_id = s3.create_multipart_upload(
Bucket='raw-videos',
Key=f"{video_id}/original"
)['UploadId']
# Store upload session
upload_sessions.set(upload_id, {
'video_id': video_id,
'user_id': user_id,
'filename': filename,
'total_size': total_size,
'uploaded_chunks': {},
'status': 'uploading'
}, ttl=86400) # 24h to complete
return {'video_id': video_id, 'upload_id': upload_id}
def upload_chunk(self, upload_id: str, chunk_number: int,
chunk_data: bytes) -> dict:
session = upload_sessions.get(upload_id)
video_id = session['video_id']
response = s3.upload_part(
Bucket='raw-videos',
Key=f"{video_id}/original",
UploadId=upload_id,
PartNumber=chunk_number,
Body=chunk_data
)
session['uploaded_chunks'][chunk_number] = response['ETag']
upload_sessions.set(upload_id, session)
return {'chunk': chunk_number, 'etag': response['ETag']}
def complete_upload(self, upload_id: str) -> dict:
session = upload_sessions.get(upload_id)
video_id = session['video_id']
# Complete S3 multipart upload
parts = [{'PartNumber': n, 'ETag': etag}
for n, etag in sorted(session['uploaded_chunks'].items())]
s3.complete_multipart_upload(
Bucket='raw-videos',
Key=f"{video_id}/original",
UploadId=upload_id,
MultipartUpload={'Parts': parts}
)
# Queue for processing
processing_queue.publish({
'video_id': video_id,
'user_id': session['user_id'],
's3_key': f"{video_id}/original"
})
return {'video_id': video_id, 'status': 'processing'}
Video Processing Pipeline
class VideoProcessor:
"""
Transcodes raw video to multiple resolutions using FFmpeg.
Runs as distributed workers consuming from processing queue.
"""
RESOLUTIONS = [
('360p', 640, 360, 500_000), # 500 Kbps
('480p', 854, 480, 1_000_000), # 1 Mbps
('720p', 1280, 720, 2_500_000), # 2.5 Mbps
('1080p', 1920, 1080,5_000_000), # 5 Mbps
('4k', 3840, 2160,15_000_000),# 15 Mbps
]
def process(self, video_id: str, s3_key: str):
# Download from S3
local_path = self._download(s3_key)
# Extract metadata (duration, fps, codec)
metadata = self._probe(local_path)
# Transcode each resolution in parallel
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [
executor.submit(self._transcode, local_path, video_id, res)
for res in self.RESOLUTIONS
if self._should_include(metadata, res)
]
results = [f.result() for f in futures]
# Generate HLS playlist (master + per-resolution)
self._generate_hls_manifest(video_id, results)
# Generate thumbnail from frame at 10% duration
self._extract_thumbnail(local_path, video_id, metadata['duration'])
# Update video status to published
videos_db.update(video_id, {
'status': 'published',
'duration': metadata['duration'],
'resolutions': [r[0] for r in self.RESOLUTIONS],
'thumbnail_url': f"https://cdn.example.com/thumbs/{video_id}.jpg"
})
def _transcode(self, input_path: str, video_id: str,
resolution: tuple) -> str:
name, width, height, bitrate = resolution
output_key = f"processed/{video_id}/{name}.m3u8"
# FFmpeg command (simplified)
cmd = [
'ffmpeg', '-i', input_path,
'-vf', f'scale={width}:{height}',
'-b:v', str(bitrate),
'-codec:a', 'aac', '-b:a', '128k',
'-hls_time', '6', # 6-second segments
'-hls_playlist_type', 'vod',
f'/tmp/{video_id}_{name}.m3u8'
]
subprocess.run(cmd, check=True)
s3.upload(output_key, f'/tmp/{video_id}_{name}.m3u8')
return output_key
def _generate_hls_manifest(self, video_id: str, segment_urls: list):
"""Generate master HLS playlist for adaptive bitrate."""
master = "#EXTM3U
#EXT-X-VERSION:3
"
bandwidth_map = {
'360p': 500000, '480p': 1000000,
'720p': 2500000, '1080p': 5000000, '4k': 15000000
}
for url in segment_urls:
res = url.split('/')[-1].replace('.m3u8', '')
bw = bandwidth_map.get(res, 1000000)
master += f"#EXT-X-STREAM-INF:BANDWIDTH={bw}
{url}
"
s3.upload(f"processed/{video_id}/master.m3u8", master)
def _should_include(self, metadata: dict, resolution: tuple) -> bool:
"""Don't upscale: skip resolutions higher than source."""
_, _, height, _ = resolution
return metadata.get('height', 0) >= height
Adaptive Bitrate Streaming (ABR)
YouTube uses DASH (Dynamic Adaptive Streaming over HTTP); Netflix uses HLS. Both work the same way:
- Video is split into short segments (2-10 seconds each)
- Each segment is encoded at multiple bitrates
- Client player monitors bandwidth and switches quality per-segment
- Master playlist tells the client which URL to use for each quality level
class ABRPlayer:
"""Client-side adaptive bitrate logic (conceptual)."""
def __init__(self):
self.buffer_size = 0 # Seconds of video buffered
self.current_quality = '480p'
self.download_speeds = [] # Last N segment download speeds
def select_quality(self) -> str:
avg_speed = sum(self.download_speeds[-5:]) / max(len(self.download_speeds), 1)
# Map bandwidth to quality tier
if avg_speed > 10_000_000: return '4k'
elif avg_speed > 4_000_000: return '1080p'
elif avg_speed > 2_000_000: return '720p'
elif avg_speed > 800_000: return '480p'
else: return '360p'
def should_buffer(self) -> bool:
return self.buffer_size < 15 # Buffer 15 seconds ahead
CDN Strategy
YouTube’s CDN (Google Global Cache) uses a two-tier approach:
- ISP-embedded caches: Google negotiates with ISPs to place cache servers inside their networks. Popular videos are pre-seeded. Cache hit rate >95% for top content.
- Edge PoPs: 130+ points of presence worldwide. Cache misses fetch from origin (S3) and populate the edge.
- Cache key:
video_id + resolution + segment_number— segment-level granularity allows efficient caching even if user seeks to a non-start position.
Metadata and Search
# Video metadata: PostgreSQL or MySQL (small-medium scale)
CREATE TABLE videos (
video_id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
title VARCHAR(500),
description TEXT,
duration INT, -- seconds
view_count BIGINT DEFAULT 0,
like_count BIGINT DEFAULT 0,
status VARCHAR(20), -- processing, published, removed
created_at TIMESTAMP,
INDEX (user_id),
INDEX (created_at)
);
# Search: Elasticsearch index
# Documents: {video_id, title, description, tags, channel_name}
# Query: multi-match across title (boost: 3), tags (boost: 2), description
# Ranking: BM25 score × recency_boost × view_count_boost
View Count Scalability
"""
Naive: UPDATE videos SET view_count = view_count + 1 WHERE video_id = ?
Problem: 11,600 concurrent streams × 1 update/second = 11,600 writes/sec
→ database bottleneck
Solution: Count-min sketch + Redis atomic counters
"""
class ViewCountService:
BATCH_FLUSH_INTERVAL = 60 # Seconds
BATCH_SIZE_THRESHOLD = 10_000
def record_view(self, video_id: str, user_id: str):
# Deduplicate: don't count re-watches within 24h
key = f"view:{user_id}:{video_id}"
if redis.set(key, 1, nx=True, ex=86400): # NX = only if not exists
# Increment atomic counter in Redis
redis.incr(f"view_count:{video_id}")
# Add to batch queue for DB flush
redis.rpush("view_count_batch", f"{video_id}:{int(time.time())}")
def flush_to_db(self):
"""Runs periodically (every 60s) to batch-update the DB."""
counts = {}
batch_size = redis.llen("view_count_batch")
for _ in range(min(batch_size, self.BATCH_SIZE_THRESHOLD)):
entry = redis.lpop("view_count_batch")
video_id = entry.split(':')[0]
counts[video_id] = counts.get(video_id, 0) + 1
# Batch update
for video_id, count in counts.items():
db.execute(
"UPDATE videos SET view_count = view_count + %s WHERE video_id = %s",
[count, video_id]
)
Recommendation System (Simplified)
- Candidate generation: Collaborative filtering (users with similar watch history); content-based (same channel, tags, topic)
- Ranking: ML model trained on engagement signals (click-through rate, watch percentage, likes, shares)
- Two-stage funnel: Retrieve thousands of candidates → rank top 100 → present top 10 with diversity constraints
- Offline training: Daily batch jobs on BigQuery/Spark; model pushed to serving infrastructure
Interview Discussion Points
- Resume upload: Track uploaded chunks server-side; client queries which chunks are missing and resumes from there
- Geographic compliance: Geo-block at CDN level using viewer IP; serve different content per region for licensing
- Copyright detection: Content ID — fingerprint every uploaded video; scan against database of copyrighted content using audio/video fingerprinting (perceptual hash)
- Hot vs cold storage: Popular videos on fast SSDs/CDN; videos with zero views for 6 months moved to Glacier (archival storage)
- Live streaming: Different pipeline — RTMP ingest → real-time segmentation → low-latency HLS (LL-HLS with 2-second segments)
🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence
🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
🏢 Asked at: Cloudflare Interview Guide 2026: Networking, Edge Computing, and CDN Design
🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale
🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
🏢 Asked at: Snap Interview Guide