System Design Interview: Design a Content Delivery Network (CDN)

What Is a CDN?

A Content Delivery Network is a globally distributed network of edge servers (Points of Presence, PoPs) that cache and serve content from locations close to end users. Instead of every request crossing the ocean to a single origin, users are served from a nearby PoP — reducing latency from 200ms to under 10ms and reducing origin load by 90%+. Cloudflare, Akamai, AWS CloudFront, and Fastly are major CDNs.

  • LinkedIn Interview Guide
  • Airbnb Interview Guide
  • Shopify Interview Guide
  • Twitter Interview Guide
  • Netflix Interview Guide
  • Cloudflare Interview Guide
  • Core Architecture

    Anycast Routing

    All CDN PoPs announce the same IP address via BGP anycast. User’s DNS resolves to the CDN’s anycast IP. The internet’s routing protocol naturally directs the user’s packets to the nearest PoP (lowest BGP hop count). No DNS-based geolocation needed — anycast gives automatic traffic steering. Single IP address, globally optimal routing.

    Edge Cache

    Each PoP runs a large cache (SSD + RAM). Cache key = URL + Vary headers (e.g., Accept-Encoding). Cache policy controlled by HTTP headers: Cache-Control: max-age=86400 (cache for 1 day), s-maxage (CDN-specific max-age), Surrogate-Control. On cache miss: PoP fetches from origin shield (not directly from origin) and caches the response.

    Origin Shield

    A regional “shield” PoP sits between edge PoPs and the origin. Cache misses from edge PoPs fan-in to the shield rather than all hitting origin independently. Reduces origin load by another 10-50x for cache misses. Critical for content that is popular but not globally uniform (regional news, local sports scores).

    Cache Eviction

    CDN edge cache is finite (e.g., 10TB SSD per PoP). Eviction policies:

    • LRU: evict least recently accessed. Simple, good for general workloads.
    • LFU: evict least frequently accessed. Better for skewed access patterns (top 1% of URLs get 90% of traffic).
    • SLRU (Segmented LRU): two-segment queue — new objects enter probation, promoted to protected on second access. Prevents one-time viral content from evicting stable popular content.

    Cache Invalidation / Purge

    When content changes (new article, price update), stale cached copies must be invalidated. Methods:

    • TTL expiry: let cache expire naturally. Simple; delay = TTL duration (seconds to hours).
    • Surrogate-Key / Cache-Tag purge: tag responses with cache keys (e.g., Surrogate-Key: product-123). Purge all responses tagged product-123 with one API call. Used by Fastly and Cloudflare.
    • URL purge: invalidate specific URLs. Works across all CDNs but requires knowing all affected URLs.

    SSL/TLS Termination at Edge

    TLS handshake adds 1-2 RTTs. CDN terminates TLS at the edge PoP (close to user), reducing handshake latency. Origin connection uses a separate TLS session or HTTP over private backbone. Benefits: (1) reduced TLS handshake RTT, (2) TLS certificate management centralized at CDN, (3) DDoS absorption at edge (attacker traffic never reaches origin).

    Dynamic Content Acceleration

    For non-cacheable dynamic content (personalized pages, API responses), CDN still helps via:

    • TCP optimization: CDN maintains persistent TCP connections to origin (avoiding 3-way handshake per request)
    • Route optimization: CDN’s private backbone (optimized routing) vs. public internet
    • Edge compute: run logic at the edge (Cloudflare Workers, Lambda@Edge) — A/B testing, auth, personalization without origin round trip

    Interview Framework

    1. What content is being served? Static (images, JS, video) vs. dynamic (API responses)?
    2. Cache policy: TTL? Vary headers? Stale-while-revalidate?
    3. Invalidation: how fast must changes propagate? (Seconds: cache tags; Hours: TTL is fine)
    4. Geographic distribution: how many PoPs? Which regions have highest traffic?
    5. Origin protection: rate limiting, DDoS absorption, origin shield.
    Scroll to Top