What is the difference between an API gateway and a service mesh?

An API gateway handles north-south traffic u2014 requests entering the system from external clients. It enforces authentication (JWT validation, API keys), rate limiting per client, request routing, and API versioning. A service mesh handles east-west traffic u2014 communication between microservices within the cluster. It provides mTLS for service-to-service authentication, circuit breaking, automatic retries, distributed tracing, and traffic shaping (canary deployments) without requiring application code changes. Both can coexist: the API gateway is the external entry point, while the service mesh governs internal service communication. They are complementary, not alternatives.

How does mTLS in a service mesh improve security over traditional service-to-service calls?

Without mTLS, any pod in the cluster can call any service without authentication u2014 a compromised pod can impersonate any service. With mTLS, each service has a cryptographic identity (X.509 certificate issued by the mesh CA, in SPIFFE format). Both sides of a connection present certificates and verify each other's identity before communication begins. This enables AuthorizationPolicies: "only the checkout-service can call POST /orders on the order-service." Certificate rotation is automatic (every 24h by default in Istio), removing the operational burden of manual certificate management. mTLS also encrypts all inter-service traffic, protecting against network-level eavesdropping within the cluster.

What is the Backend for Frontend (BFF) pattern and when is it useful?

BFF creates a dedicated API layer for each client type (mobile app, web app, partner API) rather than one general-purpose API. This solves the problem of different clients needing different data shapes, response sizes, auth strategies, and rate limits. A mobile BFF can aggregate multiple service calls into one response optimized for mobile bandwidth constraints. A web BFF can use server-sent events instead of polling. A partner BFF enforces different rate limits and data access controls without affecting internal clients. GraphQL is commonly used as a BFF because it allows clients to request exactly the fields they need, with the BFF resolving those fields from multiple downstream services.

System Design Interview: API Gateway and Service Mesh

⏱ 6 min read

API gateways and service meshes are the networking backbone of modern microservices architectures. They handle cross-cutting concerns — authentication, rate limiting, routing, observability — so individual services don’t have to. Understanding when to use each is a common senior engineering interview topic.

API Gateway: North-South Traffic

An API gateway handles traffic entering the system from external clients (internet → services). It is the single entry point for all external requests.

External Client
    │
    ▼
API Gateway (Kong, AWS API Gateway, nginx, Envoy)
    │  Responsibilities:
    │  ├── TLS termination
    │  ├── Authentication (JWT validation, API keys, OAuth)
    │  ├── Rate limiting (per-client, per-endpoint)
    │  ├── Request routing (path → service)
    │  ├── Request/response transformation
    │  ├── Load balancing (to service instances)
    │  └── Observability (access logs, metrics, tracing)
    │
    ├── /api/v1/users    → User Service
    ├── /api/v1/orders   → Order Service
    └── /api/v1/products → Product Service

Authentication at the Gateway

JWT validation flow:
  Client → "Authorization: Bearer eyJhbGc..."
  Gateway:
    1. Parse JWT header → algorithm (RS256/ES256)
    2. Fetch public key from JWKS endpoint (cached)
    3. Verify signature
    4. Check exp, iss, aud claims
    5. Extract user_id, tenant_id, scopes
    6. Forward to service as X-User-ID, X-Tenant-ID headers
       (services trust these headers — no re-validation)

API Key auth:
  Client → "X-API-Key: sk_live_xxxxx"
  Gateway → hash(key) → lookup in API key store (Redis)
           → return associated tenant_id + rate limits

Rate Limiting at the Gateway

Token bucket implementation (Redis + Lua):
  Key: rate_limit:{client_id}:{endpoint}
  Per-request Lua script (atomic):
    current = GET key
    if current < limit:
        INCR key; EXPIRE key window_seconds
        allow request
    else:
        return 429 Too Many Requests

Response headers:
  X-RateLimit-Limit: 1000
  X-RateLimit-Remaining: 847
  X-RateLimit-Reset: 1713272400  (Unix epoch when window resets)
  Retry-After: 15               (seconds until retry allowed)

Distributed rate limiting (multiple gateway instances):
  All gateways share Redis cluster for consistent counting
  Trade-off: Redis round-trip adds ~1ms per request
  Alternative: approximate counting with local buckets + periodic sync

Service Mesh: East-West Traffic

A service mesh handles traffic between services within the cluster. It operates at the infrastructure layer without requiring application code changes.

Service A ─── Envoy sidecar ──► Envoy sidecar ─── Service B
                   │                    │
                   └────────────────────┘
                         Control plane
                    (Istio / Linkerd / Consul)

Sidecar proxy handles:
  ├── mTLS: automatic certificate rotation, mutual auth
  ├── Load balancing: round-robin, least-request, zone-aware
  ├── Circuit breaking: open circuit after N failures
  ├── Retries: automatic retry with exponential backoff
  ├── Timeouts: per-route timeout enforcement
  ├── Observability: metrics, distributed traces (no app changes)
  └── Traffic shaping: canary deployment, A/B testing

mTLS: Zero-Trust Service Identity

Without service mesh:
  Service A → Service B (no authentication — any pod can call any service)

With mTLS (mutual TLS):
  Istio CA issues X.509 certificate to each service (SPIFFE format)
  Certificate: "spiffe://cluster.local/ns/default/sa/order-service"

  Service A presents cert → Service B verifies A's identity
  Service B presents cert → Service A verifies B's identity
  Channel is encrypted end-to-end

  AuthorizationPolicy (Istio):
    apiVersion: security.istio.io/v1beta1
    kind: AuthorizationPolicy
    metadata:
      name: order-service-policy
    spec:
      selector:
        matchLabels:
          app: order-service
      rules:
      - from:
        - source:
            principals: ["cluster.local/ns/default/sa/checkout-service"]
        to:
        - operation:
            methods: ["POST"]
            paths: ["/api/orders"]

Circuit Breaker Pattern

States: CLOSED → OPEN → HALF-OPEN

CLOSED (normal): requests pass through; failure rate tracked
  If failure rate > threshold (e.g., 50% in 10s):
    → OPEN: fail fast, return error immediately (no actual call)

OPEN: wait for recovery period (e.g., 30s)
  → HALF-OPEN: allow small fraction of requests through

HALF-OPEN: test if dependency has recovered
  If requests succeed → CLOSED (resume normal operation)
  If requests fail    → OPEN again (wait longer)

Service mesh implementation (Istio DestinationRule):
  trafficPolicy:
    outlierDetection:
      consecutive5xxErrors: 5      # open after 5 consecutive errors
      interval: 10s                # evaluation window
      baseEjectionTime: 30s        # how long to eject unhealthy host
      maxEjectionPercent: 50       # eject at most 50% of hosts

API Gateway vs Service Mesh Comparison

Concern	API Gateway	Service Mesh
Traffic direction	North-south (external → internal)	East-west (service → service)
Authentication	External JWT/API key validation	Internal mTLS identity
Rate limiting	Per-client/IP global limits	Service-to-service limits
Observability	Access logs, API analytics	Service dependency map, latency breakdown
Implementation	Application-aware (paths, headers)	Infrastructure layer (transparent)
Typical tools	Kong, AWS API GW, nginx, Traefik	Istio, Linkerd, Consul Connect

BFF Pattern: Backend for Frontend

Problem: one API must serve multiple clients with different needs
  Mobile app: needs lightweight responses, push notifications
  Web app:    can handle richer data, SSE instead of polling
  Partner API: needs different auth, rate limits, data format

Solution: BFF (Backend For Frontend)
  Mobile Gateway  → Mobile-optimized API → Services
  Web Gateway     → Web-optimized API    → Services
  Partner Gateway → Partner API          → Services

Benefits:
  - Each gateway tailored to client needs (field selection, pagination)
  - Independent versioning and deprecation
  - Client-specific auth strategies
  - Different rate limits per client type

Implementation: GraphQL as BFF (Apollo Federation)
  Client sends GraphQL query → BFF resolves to multiple service calls
  → assembles composite response → returns only requested fields

Interview Discussion Points

When do you need a service mesh? When you have 10+ microservices and need consistent observability, mTLS, and traffic management without modifying every service. For simple architectures (< 5 services), the operational complexity of Istio outweighs the benefits — use a shared middleware library instead.
How to handle API versioning? URL path versioning (/v1/, /v2/) is most explicit. Header-based versioning (Accept: application/vnd.api+json;version=2) is RESTful but harder to route. Keep at most 2 major versions in production simultaneously; deprecate old versions with sunset headers and 12-month migration periods.
Service mesh overhead: Envoy sidecar adds ~5-10ms per hop and 50-100MB RAM per pod. Justify with the operational savings on observability and security. Ambient mesh (Istio 1.15+) removes per-pod sidecars — uses node-level DaemonSet instead, reducing overhead significantly.

Airbnb Interview Guide

HashiCorp Interview Guide

LinkedIn Interview Guide

Uber Interview Guide

Netflix Interview Guide

Stripe Interview Guide

Companies That Ask This

Cloudflare Interview Guide