System Design: Identity and Access Management — Authentication, Authorization, and Token Lifecycle

Core Responsibilities

An identity service (also called an IAM or auth service) handles: Authentication (who are you?) — verifying credentials and issuing tokens. Authorization (what can you do?) — checking permissions for specific operations. Token lifecycle — issuing, refreshing, and revoking tokens. User management — account creation, password management, MFA. At companies like Stripe or Cloudflare, the identity service is a foundational internal service that all other services depend on for request authentication and authorization.

Authentication Flow (JWT + Refresh Tokens)

Login: user submits credentials (email + password). Server verifies: SELECT password_hash FROM users WHERE email=:e. bcrypt.verify(input, hash). On success: issue two tokens: (1) Access token (JWT): short-lived (15 minutes), stateless, signed with the identity service’s private key (RS256). Contains: user_id, roles, issued_at, expires_at. (2) Refresh token: long-lived (30 days), opaque random string, stored in the database (refresh_tokens table). The refresh token is stored in an HttpOnly cookie (not accessible by JavaScript — prevents XSS theft). Access token is stored in memory or a non-persistent cookie. Token refresh: when the access token expires, the client sends the refresh token. Server verifies it against the database, issues a new access token, optionally rotates the refresh token (refresh token rotation: each use invalidates the old token and issues a new one — detects theft). Token revocation: revoking the refresh token (on logout, on suspicious activity) prevents further access token issuance. Access tokens are stateless — can’t be revoked before expiry. Short lifetime (15 min) limits the damage of a stolen access token.

Authorization: RBAC and ABAC

RBAC (Role-Based Access Control): users are assigned roles (admin, editor, viewer). Roles have permissions (create_post, delete_user, read_analytics). Permission check: user has role R? Role R has permission P? Stored in: roles table, user_roles table, role_permissions table. Simple and auditable; used in most B2B SaaS products. ABAC (Attribute-Based Access Control): permissions depend on attributes of the user, resource, and environment. Example: “user can edit this document if they are the owner OR if the document is shared with their team.” Policy: can_edit(user, doc) = doc.owner_id == user.id OR user.team_id IN doc.shared_teams. More flexible than RBAC but harder to audit (policies can become complex). Used in Google Drive, AWS IAM (policies with conditions). For most products: start with RBAC. Add ABAC-style conditions when RBAC becomes too coarse.

Multi-Factor Authentication

MFA adds a second factor after password verification. TOTP (Time-based One-Time Password): user enrolls a TOTP app (Google Authenticator, Authy). Server generates a random 20-byte secret, encodes as base32, stores (encrypted) per user. User scans the QR code (encodes the secret). To verify: server computes TOTP(secret, current_30s_window) and compares to the user’s input. Allow ±1 window for clock skew. SMS OTP: generate a 6-digit code, store with a 10-minute TTL, send via SMS. Easier onboarding but vulnerable to SIM swapping. FIDO2/WebAuthn: hardware key or biometrics (Face ID, fingerprint). Cryptographic challenge-response — phishing-resistant. Gold standard for high-security applications (financial, enterprise). MFA bypass codes: generate 8 single-use recovery codes on MFA enrollment. Store as hashed values. User can use one if they lose their MFA device. Invalidate after use.

Token Storage and Security

Token storage on clients: Access token in memory (JavaScript variable): cleared on page refresh; safe from XSS. Refresh token in HttpOnly cookie: not accessible by JS; sent automatically with requests to the same origin. CSRF protection for cookie-based refresh: require a custom header (X-Requested-With: XMLHttpRequest) or a CSRF token. Never store tokens in localStorage: accessible to any JavaScript on the page — XSS attack can steal it. Service-to-service auth: internal services use short-lived JWT signed by the identity service’s private key, or mutual TLS (mTLS). Public key distribution: services cache the identity service’s JWKS (JSON Web Key Set) endpoint and validate JWT signatures locally — no round-trip to the identity service per request. JWKS cache TTL: 5-10 minutes. On key rotation: publish the new key alongside the old key (dual-key period) to avoid race conditions.

Asked at: Cloudflare Interview Guide

Asked at: Stripe Interview Guide

Asked at: Coinbase Interview Guide

Asked at: Airbnb Interview Guide

Scroll to Top