System Design Interview: Design YouTube / Video Streaming Platform

System Design Interview: Design YouTube / Video Streaming Platform

Designing YouTube is one of the most comprehensive system design problems — it covers video upload pipelines, adaptive bitrate streaming, CDN architecture, recommendation systems, and massive scale storage. It’s frequently asked at Netflix, YouTube/Google, Meta, and Twitch.

Functional Requirements

  • Upload videos (any format, up to several hours long)
  • Stream videos to users globally with adaptive quality
  • Search videos by title, description, and tags
  • Like, comment, and subscribe to channels
  • Personalized video recommendations
  • Track view counts and watch history

Non-Functional Requirements

  • Scale: 500 hours of video uploaded per minute; 1B+ hours watched per day
  • Storage: ~500hr/min × 60min × 1GB/hr = ~30TB raw video per hour
  • Read-heavy: upload:watch ratio ≈ 1:200
  • Low latency streaming: <2s startup time; buffer-free playback
  • Global availability: serve users in 190+ countries

Capacity Estimation

  • Uploads: 500 hours/min = 8.3 hours/sec → ~30GB/sec raw video ingestion
  • Processed storage: raw × 5 resolutions × 0.3 compression ≈ 45GB per upload-hour
  • View QPS: 1B hours/day ÷ 86,400 = ~11,600 concurrent streams; peak ~3×
  • Metadata: 800M videos × 1KB = 800GB for video metadata

High-Level Architecture

[Upload Client]
      │ (chunked multipart)
      ▼
[Upload Service] ──► [Raw Video Storage (S3)]
      │                      │
      │              [Processing Queue (SQS)]
      │                      │
      │            [Video Processing Workers]
      │             (FFmpeg: transcode to
      │              360p, 480p, 720p, 1080p, 4K)
      │                      │
      │            [Processed Video Storage (S3)]
      │                      │
      │              [CDN (CloudFront/Akamai)]
      │                      │
[Metadata DB]         [Streaming Clients]
(video title,          (HLS/DASH adaptive
 duration, tags)       bitrate playback)

Video Upload Pipeline

class VideoUploadService:
    """
    Chunked upload to handle large files reliably.
    Client splits video into 5-10MB chunks, uploads each independently.
    Server reassembles and hands off to processing pipeline.
    """
    def initiate_upload(self, user_id: str, filename: str,
                        total_size: int) -> dict:
        video_id = generate_snowflake_id()
        upload_id = s3.create_multipart_upload(
            Bucket='raw-videos',
            Key=f"{video_id}/original"
        )['UploadId']

        # Store upload session
        upload_sessions.set(upload_id, {
            'video_id': video_id,
            'user_id': user_id,
            'filename': filename,
            'total_size': total_size,
            'uploaded_chunks': {},
            'status': 'uploading'
        }, ttl=86400)  # 24h to complete

        return {'video_id': video_id, 'upload_id': upload_id}

    def upload_chunk(self, upload_id: str, chunk_number: int,
                     chunk_data: bytes) -> dict:
        session = upload_sessions.get(upload_id)
        video_id = session['video_id']

        response = s3.upload_part(
            Bucket='raw-videos',
            Key=f"{video_id}/original",
            UploadId=upload_id,
            PartNumber=chunk_number,
            Body=chunk_data
        )
        session['uploaded_chunks'][chunk_number] = response['ETag']
        upload_sessions.set(upload_id, session)

        return {'chunk': chunk_number, 'etag': response['ETag']}

    def complete_upload(self, upload_id: str) -> dict:
        session = upload_sessions.get(upload_id)
        video_id = session['video_id']

        # Complete S3 multipart upload
        parts = [{'PartNumber': n, 'ETag': etag}
                for n, etag in sorted(session['uploaded_chunks'].items())]
        s3.complete_multipart_upload(
            Bucket='raw-videos',
            Key=f"{video_id}/original",
            UploadId=upload_id,
            MultipartUpload={'Parts': parts}
        )

        # Queue for processing
        processing_queue.publish({
            'video_id': video_id,
            'user_id': session['user_id'],
            's3_key': f"{video_id}/original"
        })
        return {'video_id': video_id, 'status': 'processing'}

Video Processing Pipeline

class VideoProcessor:
    """
    Transcodes raw video to multiple resolutions using FFmpeg.
    Runs as distributed workers consuming from processing queue.
    """
    RESOLUTIONS = [
        ('360p',  640, 360,  500_000),   # 500 Kbps
        ('480p',  854, 480,  1_000_000), # 1 Mbps
        ('720p',  1280, 720, 2_500_000), # 2.5 Mbps
        ('1080p', 1920, 1080,5_000_000), # 5 Mbps
        ('4k',    3840, 2160,15_000_000),# 15 Mbps
    ]

    def process(self, video_id: str, s3_key: str):
        # Download from S3
        local_path = self._download(s3_key)

        # Extract metadata (duration, fps, codec)
        metadata = self._probe(local_path)

        # Transcode each resolution in parallel
        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = [
                executor.submit(self._transcode, local_path, video_id, res)
                for res in self.RESOLUTIONS
                if self._should_include(metadata, res)
            ]
            results = [f.result() for f in futures]

        # Generate HLS playlist (master + per-resolution)
        self._generate_hls_manifest(video_id, results)

        # Generate thumbnail from frame at 10% duration
        self._extract_thumbnail(local_path, video_id, metadata['duration'])

        # Update video status to published
        videos_db.update(video_id, {
            'status': 'published',
            'duration': metadata['duration'],
            'resolutions': [r[0] for r in self.RESOLUTIONS],
            'thumbnail_url': f"https://cdn.example.com/thumbs/{video_id}.jpg"
        })

    def _transcode(self, input_path: str, video_id: str,
                   resolution: tuple) -> str:
        name, width, height, bitrate = resolution
        output_key = f"processed/{video_id}/{name}.m3u8"
        # FFmpeg command (simplified)
        cmd = [
            'ffmpeg', '-i', input_path,
            '-vf', f'scale={width}:{height}',
            '-b:v', str(bitrate),
            '-codec:a', 'aac', '-b:a', '128k',
            '-hls_time', '6',  # 6-second segments
            '-hls_playlist_type', 'vod',
            f'/tmp/{video_id}_{name}.m3u8'
        ]
        subprocess.run(cmd, check=True)
        s3.upload(output_key, f'/tmp/{video_id}_{name}.m3u8')
        return output_key

    def _generate_hls_manifest(self, video_id: str, segment_urls: list):
        """Generate master HLS playlist for adaptive bitrate."""
        master = "#EXTM3U
#EXT-X-VERSION:3
"
        bandwidth_map = {
            '360p': 500000, '480p': 1000000,
            '720p': 2500000, '1080p': 5000000, '4k': 15000000
        }
        for url in segment_urls:
            res = url.split('/')[-1].replace('.m3u8', '')
            bw = bandwidth_map.get(res, 1000000)
            master += f"#EXT-X-STREAM-INF:BANDWIDTH={bw}
{url}
"
        s3.upload(f"processed/{video_id}/master.m3u8", master)

    def _should_include(self, metadata: dict, resolution: tuple) -> bool:
        """Don't upscale: skip resolutions higher than source."""
        _, _, height, _ = resolution
        return metadata.get('height', 0) >= height

Adaptive Bitrate Streaming (ABR)

YouTube uses DASH (Dynamic Adaptive Streaming over HTTP); Netflix uses HLS. Both work the same way:

  • Video is split into short segments (2-10 seconds each)
  • Each segment is encoded at multiple bitrates
  • Client player monitors bandwidth and switches quality per-segment
  • Master playlist tells the client which URL to use for each quality level
class ABRPlayer:
    """Client-side adaptive bitrate logic (conceptual)."""
    def __init__(self):
        self.buffer_size = 0        # Seconds of video buffered
        self.current_quality = '480p'
        self.download_speeds = []   # Last N segment download speeds

    def select_quality(self) -> str:
        avg_speed = sum(self.download_speeds[-5:]) / max(len(self.download_speeds), 1)
        # Map bandwidth to quality tier
        if avg_speed > 10_000_000:   return '4k'
        elif avg_speed > 4_000_000:  return '1080p'
        elif avg_speed > 2_000_000:  return '720p'
        elif avg_speed > 800_000:    return '480p'
        else:                        return '360p'

    def should_buffer(self) -> bool:
        return self.buffer_size < 15  # Buffer 15 seconds ahead

CDN Strategy

YouTube’s CDN (Google Global Cache) uses a two-tier approach:

  • ISP-embedded caches: Google negotiates with ISPs to place cache servers inside their networks. Popular videos are pre-seeded. Cache hit rate >95% for top content.
  • Edge PoPs: 130+ points of presence worldwide. Cache misses fetch from origin (S3) and populate the edge.
  • Cache key: video_id + resolution + segment_number — segment-level granularity allows efficient caching even if user seeks to a non-start position.
# Video metadata: PostgreSQL or MySQL (small-medium scale)
CREATE TABLE videos (
    video_id    BIGINT PRIMARY KEY,
    user_id     BIGINT NOT NULL,
    title       VARCHAR(500),
    description TEXT,
    duration    INT,           -- seconds
    view_count  BIGINT DEFAULT 0,
    like_count  BIGINT DEFAULT 0,
    status      VARCHAR(20),   -- processing, published, removed
    created_at  TIMESTAMP,
    INDEX (user_id),
    INDEX (created_at)
);

# Search: Elasticsearch index
# Documents: {video_id, title, description, tags, channel_name}
# Query: multi-match across title (boost: 3), tags (boost: 2), description
# Ranking: BM25 score × recency_boost × view_count_boost

View Count Scalability

"""
Naive: UPDATE videos SET view_count = view_count + 1 WHERE video_id = ?
Problem: 11,600 concurrent streams × 1 update/second = 11,600 writes/sec
         → database bottleneck

Solution: Count-min sketch + Redis atomic counters
"""
class ViewCountService:
    BATCH_FLUSH_INTERVAL = 60  # Seconds
    BATCH_SIZE_THRESHOLD = 10_000

    def record_view(self, video_id: str, user_id: str):
        # Deduplicate: don't count re-watches within 24h
        key = f"view:{user_id}:{video_id}"
        if redis.set(key, 1, nx=True, ex=86400):  # NX = only if not exists
            # Increment atomic counter in Redis
            redis.incr(f"view_count:{video_id}")
            # Add to batch queue for DB flush
            redis.rpush("view_count_batch", f"{video_id}:{int(time.time())}")

    def flush_to_db(self):
        """Runs periodically (every 60s) to batch-update the DB."""
        counts = {}
        batch_size = redis.llen("view_count_batch")
        for _ in range(min(batch_size, self.BATCH_SIZE_THRESHOLD)):
            entry = redis.lpop("view_count_batch")
            video_id = entry.split(':')[0]
            counts[video_id] = counts.get(video_id, 0) + 1

        # Batch update
        for video_id, count in counts.items():
            db.execute(
                "UPDATE videos SET view_count = view_count + %s WHERE video_id = %s",
                [count, video_id]
            )

Recommendation System (Simplified)

  • Candidate generation: Collaborative filtering (users with similar watch history); content-based (same channel, tags, topic)
  • Ranking: ML model trained on engagement signals (click-through rate, watch percentage, likes, shares)
  • Two-stage funnel: Retrieve thousands of candidates → rank top 100 → present top 10 with diversity constraints
  • Offline training: Daily batch jobs on BigQuery/Spark; model pushed to serving infrastructure

Interview Discussion Points

  • Resume upload: Track uploaded chunks server-side; client queries which chunks are missing and resumes from there
  • Geographic compliance: Geo-block at CDN level using viewer IP; serve different content per region for licensing
  • Copyright detection: Content ID — fingerprint every uploaded video; scan against database of copyrighted content using audio/video fingerprinting (perceptual hash)
  • Hot vs cold storage: Popular videos on fast SSDs/CDN; videos with zero views for 6 months moved to Glacier (archival storage)
  • Live streaming: Different pipeline — RTMP ingest → real-time segmentation → low-latency HLS (LL-HLS with 2-second segments)

🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

🏢 Asked at: Cloudflare Interview Guide 2026: Networking, Edge Computing, and CDN Design

🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

🏢 Asked at: Snap Interview Guide

Scroll to Top