What Is a Video Streaming Platform?
A video streaming platform stores, processes, and delivers video content to millions of concurrent viewers. Examples: YouTube (500 hours of video uploaded per minute), Netflix (200M+ subscribers), Twitch (live streaming). Core challenges: video transcoding at scale, adaptive bitrate streaming, CDN delivery, and minimizing startup latency and buffering.
System Requirements
Functional
- Upload video: ingest raw video, transcode to multiple resolutions
- Stream video: adaptive bitrate based on network conditions
- Search and browse video catalog
- Track view counts, watch history, recommendations
Non-Functional
- 500 hours uploaded per minute; 1B daily views
- Startup latency <2 seconds globally
- Seamless quality adaptation during playback
Upload and Transcoding Pipeline
User upload ──► Upload Service ──► Raw video in S3
│
Transcoding Queue (SQS)
│
Transcoding Workers (FFmpeg)
┌────────────┴──────────────┐
▼ ▼
Multiple renditions: Thumbnail extraction
1080p, 720p, 480p, (sample frames)
360p, 240p in HLS/DASH
│
CDN origin (S3/GCS)
Transcoding is CPU-intensive. A 10-minute 4K video takes ~5 minutes to transcode on a single core. Parallelize: split video into 1-minute segments, transcode segments in parallel across workers, reassemble. Spot instances for cost efficiency. Store each rendition as HLS (HTTP Live Streaming) segments: 2-second .ts chunks + a .m3u8 manifest file listing all chunks.
Adaptive Bitrate Streaming (ABR)
The video player downloads a master manifest (.m3u8) listing available quality levels. The player measures download bandwidth for each 2-second chunk. If download is fast (bandwidth > bitrate): switch to higher quality next chunk. If download is slow: switch to lower quality. This happens automatically, seamlessly, mid-stream. The user gets the highest quality their connection supports without buffering.
# Master manifest (m3u8)
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8
CDN Architecture
Video segments are large and cacheable. 95%+ of traffic is served from CDN edge nodes. Upload-to-CDN pipeline: after transcoding, push segments to the CDN origin (S3 bucket). CDN edge nodes (Cloudflare, Akamai, AWS CloudFront) cache segments at PoPs globally. First viewer in a region misses the cache (cold start); all subsequent viewers hit the edge cache. For popular videos: CDN cache hit rate approaches 100%. The origin (S3) only handles the first viewer per edge node.
Video Metadata Service
videos: id, creator_id, title, description, duration, status,
thumbnail_url, created_at, view_count
video_renditions: video_id, resolution, bitrate, manifest_url, size_bytes
Store metadata in a relational DB (PostgreSQL). view_count updated asynchronously via a Kafka consumer — do not update on every view request (too much write amplification). Batch increment view counts every 60 seconds.
Resumable Uploads
Large video files (1GB+) need resumable uploads to handle network interruptions. Protocol: initialize an upload session, get a session URL. Upload in 5MB chunks with byte range headers. Server tracks the last acknowledged byte. On network failure: resume from the last byte. This is the protocol used by YouTube Data API and GCS resumable uploads.
Recommendations
Two-stage pipeline: candidate retrieval (collaborative filtering: users who watched this also watched X) → ranking (ML model scoring candidates by predicted watch probability, weighted by recency and diversity). Store user watch history in Cassandra (write-heavy, time-series). Train recommendation models offline (daily batch), serve from a feature store with real-time features (what did the user watch in the last hour).
Live Streaming Differences
Live streaming adds latency constraints: HLS has 15-30 second latency (segment duration buffering). Low-latency HLS (LLHLS): 2-3 seconds. WebRTC: sub-second. Live segments are not cached aggressively — they expire in seconds. The ingest path: streamer → RTMP → ingest server → transcode on the fly → push to CDN → viewers.
Interview Tips
- HLS segments + CDN is the core architecture — describe it early.
- Transcoding parallelism (split into segments) shows depth.
- ABR is a client-side algorithm — the server just provides multiple renditions.
- view_count batching via Kafka avoids DB write amplification.