Netflix serves over 200 million subscribers across 190 countries, streaming billions of hours of video per month. Designing a video streaming platform tests your understanding of video transcoding, adaptive bitrate streaming, CDN architecture, and recommendation systems. This guide covers the end-to-end architecture — from content ingestion to playback — with the depth expected at senior engineering interviews.
Content Ingestion and Transcoding
When a studio delivers a master video file (4K, 50+ GB), the transcoding pipeline converts it into dozens of versions optimized for different devices and network conditions. Transcoding pipeline: (1) The master file is uploaded to S3 (multipart upload for large files). (2) A transcoding job is created and distributed across a fleet of GPU-equipped workers. (3) Each worker encodes one resolution/bitrate combination using FFmpeg or a custom encoder. Netflix encodes each title into approximately 120 different streams: 10+ resolutions (240p to 4K) x multiple bitrates per resolution x audio tracks (Dolby Atmos, stereo, various languages) x subtitle tracks. (4) Each encoded stream is segmented into small chunks (2-10 seconds each) for adaptive streaming. (5) A manifest file (HLS .m3u8 or DASH .mpd) is generated listing all available streams and their segments. (6) All segments and manifests are uploaded to S3 and distributed to CDN edge servers. Netflix uses a per-title encoding approach: instead of fixed bitrate ladders, an ML model analyzes each title content complexity and generates an optimized encoding ladder. Animation needs less bitrate than action films at the same visual quality.
Adaptive Bitrate Streaming (ABR)
ABR dynamically adjusts video quality based on the viewer network conditions. The player downloads video in small segments (2-10 seconds). Before each segment, the player estimates available bandwidth and selects the appropriate quality level. Protocols: HLS (HTTP Live Streaming, Apple) — widely supported. The .m3u8 manifest lists segment URLs for each quality level. DASH (Dynamic Adaptive Streaming over HTTP) — the open standard. The .mpd manifest serves the same purpose. Both work similarly: the player requests segments via HTTP GET. The CDN serves them like regular files. No special streaming server needed. ABR algorithms: (1) Throughput-based — estimate bandwidth from the download speed of the previous segment. Select the highest quality that fits within the estimated bandwidth. Simple but reactive (quality drops after the bandwidth has already dropped). (2) Buffer-based — maintain a playback buffer (e.g., 30 seconds). If the buffer is full, select higher quality. If the buffer is draining, select lower quality. More stable but slower to react. (3) Hybrid (Netflix) — combine throughput estimation with buffer level and use an ML model trained on playback data to predict optimal quality. This minimizes rebuffering while maximizing visual quality.
CDN Architecture for Video
Video streaming accounts for the majority of internet traffic. Netflix alone is approximately 15% of global downstream bandwidth. CDN strategy: Netflix operates its own CDN called Open Connect. Open Connect Appliances (OCAs) are custom servers placed directly inside ISP networks (embedded CDN). Each OCA has 100+ TB of SSD/HDD storage pre-loaded with popular content. When a subscriber presses play: the Netflix control plane determines which OCA is closest (within the subscriber ISP network), and directs the player to stream from that OCA. The video traffic never crosses the internet backbone — it stays within the ISP network. Benefits: lower latency (the server is physically close), higher throughput (no internet congestion), lower cost (no transit bandwidth charges), and better user experience (fewer rebuffering events). Content placement: Netflix pre-positions content based on predicted popularity. A new season of a popular show is pushed to OCAs worldwide before the release date. Less popular content is stored on central OCAs and fetched on demand. For companies without their own CDN: use CloudFront, Akamai, or Fastly. Multi-CDN strategy: use multiple CDN providers and route traffic to the fastest/cheapest one per request (CDN load balancing).
Video Playback Architecture
When a user clicks play: (1) The client sends a play request to the Netflix API with: title_id, device_type, network conditions, and DRM license info. (2) The playback service determines the optimal streams for this device (4K for a smart TV, 720p max for a mobile phone) and returns the manifest URL. (3) The client fetches the manifest (listing all quality levels and segment URLs). (4) The client requests a DRM license from the license server. Netflix uses Widevine (Android, Chrome), FairPlay (Apple), and PlayReady (Windows, Xbox). The license contains decryption keys valid for the playback session. (5) The client downloads the first few segments at a low quality (fast start) while estimating bandwidth. (6) Subsequent segments are downloaded at the ABR-selected quality. The player maintains a 30-second buffer. Trick play: fast-forward, rewind, and scrubbing require special handling. Netflix pre-generates thumbnail sprite sheets (a grid of small thumbnails, one per few seconds) so the scrub bar shows preview images without downloading full video frames. Seeking: the player finds the nearest segment boundary and starts downloading from there.
Recommendation Engine
Netflix estimates that its recommendation system is worth $1 billion per year in reduced churn. Architecture: (1) Collaborative filtering — find users with similar viewing patterns. If User A and User B both watched shows X, Y, Z, and User B also watched show W, recommend W to User A. Matrix factorization (SVD) or neural collaborative filtering learn latent user and item embeddings. (2) Content-based filtering — analyze content metadata (genre, cast, director, keywords) and match with user preferences. A user who watches many sci-fi films gets more sci-fi recommendations. (3) Hybrid model — combine collaborative and content-based signals with contextual features: time of day (lighter content in the morning), device (movies on TV, short clips on mobile), recent viewing history (continue watching), and trending content in the user region. (4) Personalized ranking — the home page rows (“Because you watched X,” “Trending Now,” “New Releases”) are each generated by a different algorithm. Within each row, titles are ranked by predicted engagement probability for the specific user. (5) Artwork personalization — Netflix selects which thumbnail image to show for each title based on the user preferences. A romance fan sees the romantic scene; an action fan sees the action scene. The recommendation pipeline runs offline (batch processing with Spark) to generate candidate sets and online (real-time model serving) to rank and personalize at request time.
Microservices Architecture
Netflix pioneered the microservices architecture with over 1,000 microservices. Key services: API Gateway (Zuul) — routes and filters all incoming requests. Handles authentication, rate limiting, and request routing to backend services. Service discovery (Eureka) — services register themselves and discover other services by name. Circuit breaker (Hystrix, now Resilience4j) — prevents cascading failures when a downstream service is unhealthy. Configuration (Archaius) — dynamic configuration without redeployment. Data: each microservice owns its data (database per service pattern). The user profile service uses Cassandra. The viewing history service uses Cassandra. The billing service uses MySQL. The recommendation service uses a custom data store. Services communicate via: REST/HTTP for synchronous calls, gRPC for high-throughput internal services, and Kafka for asynchronous event streaming (a viewing event triggers recommendation model updates, billing events, and analytics). Resilience: Netflix designed for failure. Chaos Monkey randomly terminates production instances. Chaos Kong simulates entire region failures. Every service is designed to degrade gracefully when dependencies are unavailable.