How does YouTube store and serve videos at scale?

Videos are uploaded in chunks directly to object storage (S3/GCS) using pre-signed URLs. An async processing pipeline (workers consuming from a queue) transcodes each video to multiple resolutions (360p through 4K) using FFmpeg. Processed segments are stored back in object storage and served through a global CDN. CDN edge nodes cache popular video segments close to viewers, reducing latency to 20-50ms vs 500ms+ from origin.

What is adaptive bitrate streaming and how does YouTube use it?

Adaptive Bitrate (ABR) streaming splits video into short segments (typically 2-10 seconds) encoded at multiple bitrates. The client player monitors available bandwidth and selects the appropriate quality for each segment independently. YouTube uses DASH (Dynamic Adaptive Streaming over HTTP), Netflix uses HLS. The master playlist lists all available quality URLs; the player automatically switches quality based on download speed, eliminating buffering on network changes.

How does YouTube handle video view counts at massive scale?

Naive DB increments (UPDATE videos SET view_count = view_count + 1) would bottleneck at ~11,600 writes/second at YouTube scale. Instead: Redis atomic INCR for real-time counting, with 24-hour deduplication per user-video pair using Redis NX (set-if-not-exists). A background job batch-flushes accumulated counts to the database every 60 seconds. This reduces DB writes by 3-4 orders of magnitude while keeping displayed counts near-real-time.

How does YouTube handle chunked video uploads for large files?

The client splits large videos into 5-10MB chunks using S3 Multipart Upload. Each chunk is uploaded independently with a part number; the server tracks which parts are received. On network failure, the client only re-uploads failed chunks, not the entire file. When all chunks arrive, the server calls CompleteMultipartUpload to reassemble, then queues the video for transcoding. A 24-hour session TTL prevents orphaned uploads from accumulating.

System Design Interview: Design YouTube / Video Streaming Platform

⏱ 8 min read

System Design Interview: Design YouTube / Video Streaming Platform

Designing YouTube is one of the most comprehensive system design problems — it covers video upload pipelines, adaptive bitrate streaming, CDN architecture, recommendation systems, and massive scale storage. It’s frequently asked at Netflix, YouTube/Google, Meta, and Twitch.

Functional Requirements

Upload videos (any format, up to several hours long)
Stream videos to users globally with adaptive quality
Search videos by title, description, and tags
Like, comment, and subscribe to channels
Personalized video recommendations
Track view counts and watch history

Non-Functional Requirements

Scale: 500 hours of video uploaded per minute; 1B+ hours watched per day
Storage: ~500hr/min × 60min × 1GB/hr = ~30TB raw video per hour
Read-heavy: upload:watch ratio ≈ 1:200
Low latency streaming: <2s startup time; buffer-free playback
Global availability: serve users in 190+ countries

Capacity Estimation

Uploads: 500 hours/min = 8.3 hours/sec → ~30GB/sec raw video ingestion
Processed storage: raw × 5 resolutions × 0.3 compression ≈ 45GB per upload-hour
View QPS: 1B hours/day ÷ 86,400 = ~11,600 concurrent streams; peak ~3×
Metadata: 800M videos × 1KB = 800GB for video metadata

High-Level Architecture

[Upload Client]
      │ (chunked multipart)
      ▼
[Upload Service] ──► [Raw Video Storage (S3)]
      │                      │
      │              [Processing Queue (SQS)]
      │                      │
      │            [Video Processing Workers]
      │             (FFmpeg: transcode to
      │              360p, 480p, 720p, 1080p, 4K)
      │                      │
      │            [Processed Video Storage (S3)]
      │                      │
      │              [CDN (CloudFront/Akamai)]
      │                      │
[Metadata DB]         [Streaming Clients]
(video title,          (HLS/DASH adaptive
 duration, tags)       bitrate playback)

Video Upload Pipeline

class VideoUploadService:
    """
    Chunked upload to handle large files reliably.
    Client splits video into 5-10MB chunks, uploads each independently.
    Server reassembles and hands off to processing pipeline.
    """
    def initiate_upload(self, user_id: str, filename: str,
                        total_size: int) -> dict:
        video_id = generate_snowflake_id()
        upload_id = s3.create_multipart_upload(
            Bucket='raw-videos',
            Key=f"{video_id}/original"
        )['UploadId']

        # Store upload session
        upload_sessions.set(upload_id, {
            'video_id': video_id,
            'user_id': user_id,
            'filename': filename,
            'total_size': total_size,
            'uploaded_chunks': {},
            'status': 'uploading'
        }, ttl=86400)  # 24h to complete

        return {'video_id': video_id, 'upload_id': upload_id}

    def upload_chunk(self, upload_id: str, chunk_number: int,
                     chunk_data: bytes) -> dict:
        session = upload_sessions.get(upload_id)
        video_id = session['video_id']

        response = s3.upload_part(
            Bucket='raw-videos',
            Key=f"{video_id}/original",
            UploadId=upload_id,
            PartNumber=chunk_number,
            Body=chunk_data
        )
        session['uploaded_chunks'][chunk_number] = response['ETag']
        upload_sessions.set(upload_id, session)

        return {'chunk': chunk_number, 'etag': response['ETag']}

    def complete_upload(self, upload_id: str) -> dict:
        session = upload_sessions.get(upload_id)
        video_id = session['video_id']

        # Complete S3 multipart upload
        parts = [{'PartNumber': n, 'ETag': etag}
                for n, etag in sorted(session['uploaded_chunks'].items())]
        s3.complete_multipart_upload(
            Bucket='raw-videos',
            Key=f"{video_id}/original",
            UploadId=upload_id,
            MultipartUpload={'Parts': parts}
        )

        # Queue for processing
        processing_queue.publish({
            'video_id': video_id,
            'user_id': session['user_id'],
            's3_key': f"{video_id}/original"
        })
        return {'video_id': video_id, 'status': 'processing'}

Video Processing Pipeline

class VideoProcessor:
    """
    Transcodes raw video to multiple resolutions using FFmpeg.
    Runs as distributed workers consuming from processing queue.
    """
    RESOLUTIONS = [
        ('360p',  640, 360,  500_000),   # 500 Kbps
        ('480p',  854, 480,  1_000_000), # 1 Mbps
        ('720p',  1280, 720, 2_500_000), # 2.5 Mbps
        ('1080p', 1920, 1080,5_000_000), # 5 Mbps
        ('4k',    3840, 2160,15_000_000),# 15 Mbps
    ]

    def process(self, video_id: str, s3_key: str):
        # Download from S3
        local_path = self._download(s3_key)

        # Extract metadata (duration, fps, codec)
        metadata = self._probe(local_path)

        # Transcode each resolution in parallel
        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = [
                executor.submit(self._transcode, local_path, video_id, res)
                for res in self.RESOLUTIONS
                if self._should_include(metadata, res)
            ]
            results = [f.result() for f in futures]

        # Generate HLS playlist (master + per-resolution)
        self._generate_hls_manifest(video_id, results)

        # Generate thumbnail from frame at 10% duration
        self._extract_thumbnail(local_path, video_id, metadata['duration'])

        # Update video status to published
        videos_db.update(video_id, {
            'status': 'published',
            'duration': metadata['duration'],
            'resolutions': [r[0] for r in self.RESOLUTIONS],
            'thumbnail_url': f"https://cdn.example.com/thumbs/{video_id}.jpg"
        })

    def _transcode(self, input_path: str, video_id: str,
                   resolution: tuple) -> str:
        name, width, height, bitrate = resolution
        output_key = f"processed/{video_id}/{name}.m3u8"
        # FFmpeg command (simplified)
        cmd = [
            'ffmpeg', '-i', input_path,
            '-vf', f'scale={width}:{height}',
            '-b:v', str(bitrate),
            '-codec:a', 'aac', '-b:a', '128k',
            '-hls_time', '6',  # 6-second segments
            '-hls_playlist_type', 'vod',
            f'/tmp/{video_id}_{name}.m3u8'
        ]
        subprocess.run(cmd, check=True)
        s3.upload(output_key, f'/tmp/{video_id}_{name}.m3u8')
        return output_key

    def _generate_hls_manifest(self, video_id: str, segment_urls: list):
        """Generate master HLS playlist for adaptive bitrate."""
        master = "#EXTM3U
#EXT-X-VERSION:3
"
        bandwidth_map = {
            '360p': 500000, '480p': 1000000,
            '720p': 2500000, '1080p': 5000000, '4k': 15000000
        }
        for url in segment_urls:
            res = url.split('/')[-1].replace('.m3u8', '')
            bw = bandwidth_map.get(res, 1000000)
            master += f"#EXT-X-STREAM-INF:BANDWIDTH={bw}
{url}
"
        s3.upload(f"processed/{video_id}/master.m3u8", master)

    def _should_include(self, metadata: dict, resolution: tuple) -> bool:
        """Don't upscale: skip resolutions higher than source."""
        _, _, height, _ = resolution
        return metadata.get('height', 0) >= height

Adaptive Bitrate Streaming (ABR)

YouTube uses DASH (Dynamic Adaptive Streaming over HTTP); Netflix uses HLS. Both work the same way:

Video is split into short segments (2-10 seconds each)
Each segment is encoded at multiple bitrates
Client player monitors bandwidth and switches quality per-segment
Master playlist tells the client which URL to use for each quality level

class ABRPlayer:
    """Client-side adaptive bitrate logic (conceptual)."""
    def __init__(self):
        self.buffer_size = 0        # Seconds of video buffered
        self.current_quality = '480p'
        self.download_speeds = []   # Last N segment download speeds

    def select_quality(self) -> str:
        avg_speed = sum(self.download_speeds[-5:]) / max(len(self.download_speeds), 1)
        # Map bandwidth to quality tier
        if avg_speed > 10_000_000:   return '4k'
        elif avg_speed > 4_000_000:  return '1080p'
        elif avg_speed > 2_000_000:  return '720p'
        elif avg_speed > 800_000:    return '480p'
        else:                        return '360p'

    def should_buffer(self) -> bool:
        return self.buffer_size < 15  # Buffer 15 seconds ahead

CDN Strategy

YouTube’s CDN (Google Global Cache) uses a two-tier approach:

ISP-embedded caches: Google negotiates with ISPs to place cache servers inside their networks. Popular videos are pre-seeded. Cache hit rate >95% for top content.
Edge PoPs: 130+ points of presence worldwide. Cache misses fetch from origin (S3) and populate the edge.
Cache key: video_id + resolution + segment_number — segment-level granularity allows efficient caching even if user seeks to a non-start position.

Metadata and Search

# Video metadata: PostgreSQL or MySQL (small-medium scale)
CREATE TABLE videos (
    video_id    BIGINT PRIMARY KEY,
    user_id     BIGINT NOT NULL,
    title       VARCHAR(500),
    description TEXT,
    duration    INT,           -- seconds
    view_count  BIGINT DEFAULT 0,
    like_count  BIGINT DEFAULT 0,
    status      VARCHAR(20),   -- processing, published, removed
    created_at  TIMESTAMP,
    INDEX (user_id),
    INDEX (created_at)
);

# Search: Elasticsearch index
# Documents: {video_id, title, description, tags, channel_name}
# Query: multi-match across title (boost: 3), tags (boost: 2), description
# Ranking: BM25 score × recency_boost × view_count_boost

View Count Scalability

"""
Naive: UPDATE videos SET view_count = view_count + 1 WHERE video_id = ?
Problem: 11,600 concurrent streams × 1 update/second = 11,600 writes/sec
         → database bottleneck

Solution: Count-min sketch + Redis atomic counters
"""
class ViewCountService:
    BATCH_FLUSH_INTERVAL = 60  # Seconds
    BATCH_SIZE_THRESHOLD = 10_000

    def record_view(self, video_id: str, user_id: str):
        # Deduplicate: don't count re-watches within 24h
        key = f"view:{user_id}:{video_id}"
        if redis.set(key, 1, nx=True, ex=86400):  # NX = only if not exists
            # Increment atomic counter in Redis
            redis.incr(f"view_count:{video_id}")
            # Add to batch queue for DB flush
            redis.rpush("view_count_batch", f"{video_id}:{int(time.time())}")

    def flush_to_db(self):
        """Runs periodically (every 60s) to batch-update the DB."""
        counts = {}
        batch_size = redis.llen("view_count_batch")
        for _ in range(min(batch_size, self.BATCH_SIZE_THRESHOLD)):
            entry = redis.lpop("view_count_batch")
            video_id = entry.split(':')[0]
            counts[video_id] = counts.get(video_id, 0) + 1

        # Batch update
        for video_id, count in counts.items():
            db.execute(
                "UPDATE videos SET view_count = view_count + %s WHERE video_id = %s",
                [count, video_id]
            )

Recommendation System (Simplified)

Candidate generation: Collaborative filtering (users with similar watch history); content-based (same channel, tags, topic)
Ranking: ML model trained on engagement signals (click-through rate, watch percentage, likes, shares)
Two-stage funnel: Retrieve thousands of candidates → rank top 100 → present top 10 with diversity constraints
Offline training: Daily batch jobs on BigQuery/Spark; model pushed to serving infrastructure

Interview Discussion Points

Resume upload: Track uploaded chunks server-side; client queries which chunks are missing and resumes from there
Geographic compliance: Geo-block at CDN level using viewer IP; serve different content per region for licensing
Copyright detection: Content ID — fingerprint every uploaded video; scan against database of copyrighted content using audio/video fingerprinting (perceptual hash)
Hot vs cold storage: Popular videos on fast SSDs/CDN; videos with zero views for 6 months moved to Glacier (archival storage)
Live streaming: Different pipeline — RTMP ingest → real-time segmentation → low-latency HLS (LL-HLS with 2-second segments)

🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

🏢 Asked at: Cloudflare Interview Guide 2026: Networking, Edge Computing, and CDN Design

🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

🏢 Asked at: Snap Interview Guide