System Design: Video Processing Pipeline (YouTube/Netflix) — Transcoding, HLS, and Scaling

The Video Upload and Processing Problem

YouTube processes 500 hours of video every minute. Each uploaded video must be transcoded into multiple resolutions (360p, 480p, 720p, 1080p, 4K), multiple formats (MP4/H.264, WebM/VP9, HLS segments for adaptive streaming), and have thumbnails generated — all before the video can be served. The challenge: decoupled async processing at massive scale with fault tolerance.

High-Level Architecture

User Browser
    │
    ├─[1] Upload raw video → Object Store (S3 "raw" bucket)
    │      via presigned URL — bypasses API servers
    │
API Server
    ├─[2] Create video metadata (title, status=PROCESSING)
    ├─[3] Publish "video.uploaded" event → Kafka
    │
Transcoding Workers (consume Kafka)
    ├─[4a] Download raw video from S3
    ├─[4b] Transcode to all target resolutions (FFmpeg)
    ├─[4c] Upload transcoded files → S3 "transcoded" bucket
    ├─[4d] Generate thumbnails
    ├─[5] Publish "video.transcoded" event → Kafka
    │
Post-Processing Workers
    ├─[6] Update video status → PUBLISHED
    ├─[7] Invalidate CDN cache for video page
    └─[8] Trigger search index update

Upload via Presigned URL

Never route large file uploads through your API servers — it wastes bandwidth and CPU. Instead: (1) Client requests a presigned S3 URL from the API server. (2) API server generates a time-limited URL (15 minutes) directly from S3. (3) Client uploads the raw video directly to S3. (4) S3 event triggers a notification → API server marks upload as received. This keeps API servers thin and uses S3’s upload bandwidth directly.

import boto3
from datetime import timedelta

def generate_upload_url(video_id: str) -> dict:
    s3 = boto3.client('s3')
    key = f"raw/{video_id}/original.mp4"
    url = s3.generate_presigned_url(
        'put_object',
        Params={'Bucket': 'my-raw-videos', 'Key': key, 'ContentType': 'video/mp4'},
        ExpiresIn=900,  # 15 minutes
    )
    return {'upload_url': url, 'video_id': video_id, 'key': key}

Transcoding with FFmpeg

FFmpeg is the standard open-source tool for video transcoding. Each target resolution is a separate FFmpeg invocation (or a single pass with multiple outputs).

import subprocess
import os

RESOLUTIONS = [
    {'name': '360p',  'width': 640,  'height': 360,  'bitrate': '800k'},
    {'name': '720p',  'width': 1280, 'height': 720,  'bitrate': '2500k'},
    {'name': '1080p', 'width': 1920, 'height': 1080, 'bitrate': '5000k'},
]

def transcode(input_path: str, output_dir: str, video_id: str) -> list:
    outputs = []
    for res in RESOLUTIONS:
        output_path = os.path.join(output_dir, f"{res['name']}.mp4")
        cmd = [
            'ffmpeg', '-i', input_path,
            '-vf', f"scale={res['width']}:{res['height']}",
            '-b:v', res['bitrate'],
            '-c:v', 'libx264', '-c:a', 'aac',
            '-movflags', 'faststart',   # moov atom at start for fast seek
            '-y', output_path
        ]
        subprocess.run(cmd, check=True)
        outputs.append({'resolution': res['name'], 'path': output_path})
    return outputs

def generate_thumbnail(input_path: str, output_path: str, timestamp: str = '00:00:05'):
    subprocess.run([
        'ffmpeg', '-i', input_path,
        '-ss', timestamp, '-vframes', '1',
        '-q:v', '2', '-y', output_path
    ], check=True)

Adaptive Bitrate Streaming (HLS)

Modern video players use adaptive streaming: they download short segments (2-10 seconds) and dynamically switch quality based on available bandwidth. HLS (HTTP Live Streaming) segments video into .ts chunks and serves an M3U8 playlist.

# Generate HLS segments for all resolutions
def create_hls(input_path: str, output_dir: str):
    os.makedirs(output_dir, exist_ok=True)
    # Create per-resolution playlists
    for res in RESOLUTIONS:
        res_dir = os.path.join(output_dir, res['name'])
        os.makedirs(res_dir, exist_ok=True)
        subprocess.run([
            'ffmpeg', '-i', input_path,
            '-vf', f"scale={res['width']}:{res['height']}",
            '-b:v', res['bitrate'], '-c:v', 'libx264', '-c:a', 'aac',
            '-hls_time', '6',          # 6-second segments
            '-hls_playlist_type', 'vod',
            '-hls_segment_filename', os.path.join(res_dir, 'segment_%03d.ts'),
            os.path.join(res_dir, 'index.m3u8')
        ], check=True)
    # Create master playlist referencing all resolutions
    master_playlist = "#EXTM3Un"
    for res in RESOLUTIONS:
        master_playlist += f'#EXT-X-STREAM-INF:BANDWIDTH={int(res["bitrate"][:-1])*1000},RESOLUTION={res["width"]}x{res["height"]}n'
        master_playlist += f'{res["name"]}/index.m3u8n'
    with open(os.path.join(output_dir, 'master.m3u8'), 'w') as f:
        f.write(master_playlist)

Worker Scaling and Fault Tolerance

  • Kafka consumer groups: each transcoding worker is a consumer in the same consumer group. Kafka assigns partitions across workers for parallel processing. If a worker crashes, its partitions are rebalanced to healthy workers.
  • Idempotent workers: workers check if transcoded files already exist in S3 before transcoding (S3 HeadObject). If the worker crashes mid-transcode and restarts, it safely re-transcodes (overwriting the partial output).
  • Dead letter queue: after N failed transcode attempts, move the event to a DLQ for manual inspection and alerting.
  • Progress tracking: for long transcodes (4K video = hours), periodically update the job’s progress in Redis (percent complete) so the UI can show progress bars.
  • Spot/preemptible instances: transcoding is CPU-intensive but stateless (can restart from scratch). Use AWS Spot instances or GCP preemptible VMs — 60-90% cost reduction. If the instance is preempted, Kafka offset remains uncommitted and another worker picks up the job.

CDN Integration

After transcoding, video segments are served from a CDN (CloudFront, Fastly). The CDN caches segments globally — users stream from the nearest edge node. Videos are “push” cached (pre-warmed at edge) for popular content and “pull” cached (cached on first request) for long-tail content.

Interview Questions

Q: How do you handle a 4-hour video upload and transcoding?

For large files: use multipart upload (S3 multipart allows parallel 5MB chunks, resumable on failure). For transcoding: split the video into temporal segments (chunk by time), transcode each chunk independently in parallel, then concatenate. A 4-hour video split into 10-minute chunks = 24 parallel transcoding jobs. This reduces end-to-end latency from hours to minutes. Use a distributed job queue (Kafka + worker pool) to parallelize. Track chunk completion; merge when all chunks are done.

Q: How do you estimate the compute cost of transcoding?

Transcoding is roughly 1:1 to 1:3 real-time (encoding 1 minute of video takes 1-3 minutes of CPU). For 500 hours of video per minute: 500 * 60 = 30,000 minutes of raw video per minute. With 4 resolutions: 120,000 minutes of transcoding per minute. At 3x real-time speed: 40,000 minutes of CPU per minute = ~667 CPU cores constantly running. In practice, use GPU-accelerated transcoding (NVENC) — 10-50x faster than CPU, dramatically reducing cost.

Asked at: Netflix Interview Guide

Asked at: Uber Interview Guide

Asked at: Databricks Interview Guide

Asked at: Twitter/X Interview Guide

Asked at: Cloudflare Interview Guide

Scroll to Top