System Design Interview: Microservices and Service Mesh (Envoy, Istio, mTLS)

What Is a Service Mesh?

A service mesh is an infrastructure layer that handles service-to-service communication in a microservices architecture. Instead of embedding networking logic (retries, mTLS, circuit breaking, tracing) in each service, a service mesh externalizes it into sidecar proxies deployed alongside every service instance. Envoy is the dominant sidecar proxy; Istio and Linkerd are the most common control planes that manage the sidecar fleet.

  • Stripe Interview Guide
  • LinkedIn Interview Guide
  • Cloudflare Interview Guide
  • Airbnb Interview Guide
  • Uber Interview Guide
  • Netflix Interview Guide
  • Why Microservices Fail Without a Mesh

    • Duplicate code: every team reimplements retry logic, timeouts, circuit breakers
    • No mTLS: east-west traffic between services is unencrypted and unauthenticated
    • Observability gaps: no uniform distributed tracing or request metrics without instrumentation in every service
    • Service discovery coupling: services hardcode addresses or depend on a shared client library

    Sidecar Proxy Architecture

    Each pod runs two containers: the application container and an Envoy sidecar. iptables rules intercept all inbound and outbound traffic and redirect it through Envoy. The application connects to localhost; Envoy handles everything else: load balancing, retries, circuit breaking, mTLS termination, and emitting metrics/traces.

    The control plane (Istio’s istiod) pushes configuration to all Envoy sidecars via xDS protocol. Service discovery data, routing rules, and certificate rotation all flow through this control plane channel.

    Service Discovery

    Two models:

    • Client-side discovery: the client queries a service registry (Consul, Eureka) and picks an instance. Simple, but discovery logic is in every client.
    • Server-side discovery: the client sends to a load balancer/proxy (Envoy, AWS ALB). The proxy queries the registry and routes. Client is dumb — no discovery SDK needed.

    Service mesh uses server-side discovery via the sidecar. Kubernetes uses DNS + kube-proxy for basic discovery; Istio replaces kube-proxy routing with Envoy for advanced policies.

    mTLS — Mutual TLS

    Every service gets a short-lived X.509 certificate provisioned by the control plane (SPIFFE/SPIRE identity). Envoy validates the peer certificate on every connection — both sides authenticate. Benefits: (1) encryption of all east-west traffic, (2) strong service identity — services can’t impersonate each other, (3) zero-trust networking — firewall rules are no longer the only perimeter. Certificate rotation is automatic and transparent to the application.

    Circuit Breaker in the Mesh

    Configured declaratively in Istio DestinationRule:

    outlierDetection:
      consecutiveErrors: 5          # open after 5 consecutive 5xx
      interval: 10s                 # evaluation window
      baseEjectionTime: 30s         # how long to eject the host
      maxEjectionPercent: 50        # at most 50% of hosts ejected
    

    Envoy ejects unhealthy endpoints from the load balancing pool. Traffic reroutes to healthy instances. After baseEjectionTime, the host is probed (one request). If it succeeds, it rejoins the pool.

    Traffic Management

    • Canary deployment: weight 95% to v1, 5% to v2 — controlled via VirtualService
    • Header-based routing: route requests with X-User-Beta: true to v2
    • Fault injection: inject 10% HTTP 500s or 100ms delays for chaos testing
    • Retry policy: retry on 503, up to 3 times, 25ms retry interval

    Observability

    Envoy emits metrics (Prometheus), access logs (Loki/Splunk), and trace spans (Zipkin/Jaeger) for every request — without any instrumentation in the application. You get RED metrics (Rate, Errors, Duration) for every service-to-service call automatically.

    Interview Framework

    1. How do services discover each other? Client-side vs. server-side vs. mesh.
    2. How is east-west traffic secured? mTLS via sidecar.
    3. How do you prevent cascade failures? Circuit breaker + bulkhead in Envoy.
    4. How do you deploy changes safely? Canary via weighted routing in VirtualService.
    5. How do you observe distributed requests? Distributed tracing injected by Envoy.
    Scroll to Top