Question 1

How does YouTube handle video upload and processing at scale?

Accepted Answer

The client requests a presigned URL from the Upload Service. The video is uploaded directly to object storage (S3/GCS) -- bypassing app servers. On upload completion, S3 fires an event to a message queue (SQS/Kafka). Transcoding workers consume these events and process each video: extract metadata (duration, resolution, codec), transcode to multiple resolutions (360p, 480p, 720p, 1080p, 4K) using FFmpeg. Each resolution is a separate task, parallelizable across workers. Thumbnails are extracted at configurable timestamps. Output segments and manifests are written to a CDN origin bucket. The video record is updated to AVAILABLE only after all required resolutions are processed.

Question 2

What is HLS adaptive streaming and how does it work?

Accepted Answer

HLS (HTTP Live Streaming) splits a video into small segments (2-6 seconds each) and generates an M3U8 playlist file per resolution. A master playlist lists all variant streams with their bandwidth and resolution. The video player downloads the master playlist, then selects the variant stream matching available bandwidth. During playback, the player continuously monitors download speed and buffer level. If bandwidth drops, it switches to a lower-resolution playlist. If bandwidth improves, it switches up. Each resolution is independently segmented so the player can switch at any segment boundary. This allows smooth playback even on variable-quality connections.

Question 3

How do you prevent duplicate video processing when a worker crashes mid-job?

Accepted Answer

Use message queue visibility timeout. When a worker picks up a transcoding job, the message becomes invisible to other consumers for N minutes (e.g., 30 minutes for a long transcode). If the worker crashes, the message reappears after the timeout and is reprocessed by another worker. Make transcoding idempotent: output files are written to deterministic paths (video_id/720p/segment_001.ts). A re-run overwrites the same files. Track progress in a jobs table: status=PROCESSING with worker_id and started_at. On crash recovery, the new worker picks up the message, checks the jobs table, and resumes from the last completed segment checkpoint. On success, status=DONE and the message is deleted.

Question 4

How do you scale the transcoding pipeline to handle 500 hours of video uploaded per minute?

Accepted Answer

Use a worker pool auto-scaled by queue depth. Each transcoding job (one video, one resolution) is an independent task. A 10-minute video at 720p takes about 2 minutes of CPU time. With 500 hours/minute of uploads and 4 resolution variants each = 2000 transcoding jobs/minute, each taking about 2 minutes = about 4000 concurrent workers needed at peak. Use spot/preemptible instances for transcoding (70% cost savings); handle preemption with job checkpointing. Separate queues for priority tiers: premium users go to a fast queue served by on-demand instances; free uploads go to the spot queue. CDN caches all segments, so playback load does not hit origin.

Question 5

How do you implement resumable video uploads for large files?

Accepted Answer

Use multipart upload (S3 multipart or TUS protocol). The client splits the file into chunks (5-50MB each). Each chunk is uploaded independently with a part number. S3 stores parts until CompleteMultipartUpload is called with the list of ETags. If a chunk fails, only that chunk is retried -- not the entire file. The client tracks which parts succeeded using localStorage. On browser restart, the client queries which parts the server has and resumes from the first missing chunk. S3 multipart uploads can be paused indefinitely (up to 7 days by default). Set a lifecycle rule to abort incomplete multipart uploads after 24 hours to avoid storage leakage.

System Design: Video Processing Pipeline (YouTube/Netflix) — Transcoding, HLS, and Scaling

The Video Upload and Processing Problem

High-Level Architecture

Upload via Presigned URL

Transcoding with FFmpeg

Adaptive Bitrate Streaming (HLS)

Worker Scaling and Fault Tolerance

CDN Integration

Interview Questions

Q: How do you handle a 4-hour video upload and transcoding?

Q: How do you estimate the compute cost of transcoding?