Question 1

How does adaptive bitrate streaming (HLS) work and why does it prevent buffering?

Accepted Answer

HLS (HTTP Live Streaming) divides a video into small segments (2-4 seconds each) encoded at multiple bitrates (e.g., 240p at 400Kbps, 720p at 2.5Mbps, 1080p at 5Mbps). The player downloads a master manifest listing all available quality levels. It then selects a quality playlist and begins downloading segments sequentially over plain HTTP. After each segment download, the player measures the actual download throughput. If throughput exceeds the current bitrate comfortably: switch to a higher quality level for the next segment. If throughput drops: switch to a lower quality. This segment-level adaptation means quality changes happen every 2-4 seconds — never mid-frame. Why it prevents buffering: the player maintains a buffer (e.g., 30 seconds of video ahead). If it detects bandwidth dropping, it switches to a lower bitrate before the buffer drains. The buffer acts as a shock absorber. Contrast with progressive download (single file): no quality adaptation, buffers completely on bandwidth drops. HLS and DASH (used by YouTube, Netflix) enable smooth streaming on variable connections.

Question 2

How do you design the video transcoding pipeline to handle 500 hours uploaded per minute?

Accepted Answer

At 500 hours/minute, assuming average 30-minute videos at 1GB raw: 1000 videos/minute = 17 uploads/second. Each video needs 5 renditions (1080p, 720p, 480p, 360p, 240p). Single-threaded transcoding: a 30-minute 1080p video takes ~30 minutes to transcode — throughput of 1 video/30min per worker. Need 1000*30 = 30,000 worker-minutes to keep up. Solution: (1) Split each video into 1-minute segments (30 segments per video). Transcode all segments in parallel across workers. Reassemble into one HLS playlist. (2) Auto-scaling transcoding workers (EC2 Spot instances: 70% cheaper). (3) Priority queue: shorter videos are transcoded first (better user experience), longer videos queue. (4) Tiered transcoding: immediately transcode 360p (lowest quality, fastest), so the video is viewable within 2 minutes. Higher quality renditions follow. (5) Dedicated codec acceleration: GPU-accelerated encoding (NVENC for H.264/H.265) is 5-10x faster than CPU. At scale, this pipeline keeps per-video transcoding lag under 5 minutes even at 500 hours/minute.

Question 3

How does a CDN serve video at global scale and what happens on a cache miss?

Accepted Answer

A CDN (Content Delivery Network) has Points of Presence (PoPs) in 200+ cities globally. Each PoP caches video segments. When a viewer in Tokyo requests a video segment: DNS resolves to the nearest Tokyo CDN edge. Edge checks its cache: if hit, serves directly (sub-millisecond, no origin request). If miss: edge fetches the segment from the origin (S3 bucket in us-east-1), caches it, and serves it to the viewer. Subsequent Tokyo viewers get the cached version. Cache hit rate for popular videos: 95%+. The origin (S3) only handles the first viewer per edge per segment. For a viral video with 1M concurrent viewers spread across 50 PoPs: S3 handles ~50 requests (one cold start per PoP per segment). Each 4-second segment requires one cache fill — 50 fills * 5MB/segment = 250MB to S3. CDN handles the other 999,950 requests. CDN cost model: charge per GB delivered from edge (cheap) + per GB transferred from origin to CDN (more expensive). Maximizing cache hit rate minimizes origin cost. For 4K videos too large for all PoPs: use tiered CDN caching (regional cache above local edge caches).

System Design Interview: Design a Video Streaming Platform (YouTube/Netflix)

What Is a Video Streaming Platform?

System Requirements

Functional

Non-Functional

Upload and Transcoding Pipeline

Adaptive Bitrate Streaming (ABR)

CDN Architecture

Video Metadata Service

Resumable Uploads

Recommendations

Live Streaming Differences

Interview Tips