API gateways and service meshes are the networking backbone of modern microservices architectures. They handle cross-cutting concerns — authentication, rate limiting, routing, observability — so individual services don’t have to. Understanding when to use each is a common senior engineering interview topic.
API Gateway: North-South Traffic
An API gateway handles traffic entering the system from external clients (internet → services). It is the single entry point for all external requests.
External Client
│
▼
API Gateway (Kong, AWS API Gateway, nginx, Envoy)
│ Responsibilities:
│ ├── TLS termination
│ ├── Authentication (JWT validation, API keys, OAuth)
│ ├── Rate limiting (per-client, per-endpoint)
│ ├── Request routing (path → service)
│ ├── Request/response transformation
│ ├── Load balancing (to service instances)
│ └── Observability (access logs, metrics, tracing)
│
├── /api/v1/users → User Service
├── /api/v1/orders → Order Service
└── /api/v1/products → Product Service
Authentication at the Gateway
JWT validation flow:
Client → "Authorization: Bearer eyJhbGc..."
Gateway:
1. Parse JWT header → algorithm (RS256/ES256)
2. Fetch public key from JWKS endpoint (cached)
3. Verify signature
4. Check exp, iss, aud claims
5. Extract user_id, tenant_id, scopes
6. Forward to service as X-User-ID, X-Tenant-ID headers
(services trust these headers — no re-validation)
API Key auth:
Client → "X-API-Key: sk_live_xxxxx"
Gateway → hash(key) → lookup in API key store (Redis)
→ return associated tenant_id + rate limits
Rate Limiting at the Gateway
Token bucket implementation (Redis + Lua):
Key: rate_limit:{client_id}:{endpoint}
Per-request Lua script (atomic):
current = GET key
if current < limit:
INCR key; EXPIRE key window_seconds
allow request
else:
return 429 Too Many Requests
Response headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1713272400 (Unix epoch when window resets)
Retry-After: 15 (seconds until retry allowed)
Distributed rate limiting (multiple gateway instances):
All gateways share Redis cluster for consistent counting
Trade-off: Redis round-trip adds ~1ms per request
Alternative: approximate counting with local buckets + periodic sync
Service Mesh: East-West Traffic
A service mesh handles traffic between services within the cluster. It operates at the infrastructure layer without requiring application code changes.
Service A ─── Envoy sidecar ──► Envoy sidecar ─── Service B
│ │
└────────────────────┘
Control plane
(Istio / Linkerd / Consul)
Sidecar proxy handles:
├── mTLS: automatic certificate rotation, mutual auth
├── Load balancing: round-robin, least-request, zone-aware
├── Circuit breaking: open circuit after N failures
├── Retries: automatic retry with exponential backoff
├── Timeouts: per-route timeout enforcement
├── Observability: metrics, distributed traces (no app changes)
└── Traffic shaping: canary deployment, A/B testing
mTLS: Zero-Trust Service Identity
Without service mesh:
Service A → Service B (no authentication — any pod can call any service)
With mTLS (mutual TLS):
Istio CA issues X.509 certificate to each service (SPIFFE format)
Certificate: "spiffe://cluster.local/ns/default/sa/order-service"
Service A presents cert → Service B verifies A's identity
Service B presents cert → Service A verifies B's identity
Channel is encrypted end-to-end
AuthorizationPolicy (Istio):
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: order-service-policy
spec:
selector:
matchLabels:
app: order-service
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/checkout-service"]
to:
- operation:
methods: ["POST"]
paths: ["/api/orders"]
Circuit Breaker Pattern
States: CLOSED → OPEN → HALF-OPEN
CLOSED (normal): requests pass through; failure rate tracked
If failure rate > threshold (e.g., 50% in 10s):
→ OPEN: fail fast, return error immediately (no actual call)
OPEN: wait for recovery period (e.g., 30s)
→ HALF-OPEN: allow small fraction of requests through
HALF-OPEN: test if dependency has recovered
If requests succeed → CLOSED (resume normal operation)
If requests fail → OPEN again (wait longer)
Service mesh implementation (Istio DestinationRule):
trafficPolicy:
outlierDetection:
consecutive5xxErrors: 5 # open after 5 consecutive errors
interval: 10s # evaluation window
baseEjectionTime: 30s # how long to eject unhealthy host
maxEjectionPercent: 50 # eject at most 50% of hosts
API Gateway vs Service Mesh Comparison
| Concern | API Gateway | Service Mesh |
|---|---|---|
| Traffic direction | North-south (external → internal) | East-west (service → service) |
| Authentication | External JWT/API key validation | Internal mTLS identity |
| Rate limiting | Per-client/IP global limits | Service-to-service limits |
| Observability | Access logs, API analytics | Service dependency map, latency breakdown |
| Implementation | Application-aware (paths, headers) | Infrastructure layer (transparent) |
| Typical tools | Kong, AWS API GW, nginx, Traefik | Istio, Linkerd, Consul Connect |
BFF Pattern: Backend for Frontend
Problem: one API must serve multiple clients with different needs
Mobile app: needs lightweight responses, push notifications
Web app: can handle richer data, SSE instead of polling
Partner API: needs different auth, rate limits, data format
Solution: BFF (Backend For Frontend)
Mobile Gateway → Mobile-optimized API → Services
Web Gateway → Web-optimized API → Services
Partner Gateway → Partner API → Services
Benefits:
- Each gateway tailored to client needs (field selection, pagination)
- Independent versioning and deprecation
- Client-specific auth strategies
- Different rate limits per client type
Implementation: GraphQL as BFF (Apollo Federation)
Client sends GraphQL query → BFF resolves to multiple service calls
→ assembles composite response → returns only requested fields
Interview Discussion Points
- When do you need a service mesh? When you have 10+ microservices and need consistent observability, mTLS, and traffic management without modifying every service. For simple architectures (< 5 services), the operational complexity of Istio outweighs the benefits — use a shared middleware library instead.
- How to handle API versioning? URL path versioning (/v1/, /v2/) is most explicit. Header-based versioning (Accept: application/vnd.api+json;version=2) is RESTful but harder to route. Keep at most 2 major versions in production simultaneously; deprecate old versions with sunset headers and 12-month migration periods.
- Service mesh overhead: Envoy sidecar adds ~5-10ms per hop and 50-100MB RAM per pod. Justify with the operational savings on observability and security. Ambient mesh (Istio 1.15+) removes per-pod sidecars — uses node-level DaemonSet instead, reducing overhead significantly.