Low-Level Design: Multi-Tenant SaaS Platform — Tenant Isolation, Schema Design, and Rate Limiting

What is Multi-Tenancy

A multi-tenant SaaS serves multiple customers (tenants) from the same shared infrastructure. Each tenant’s data must be isolated: Tenant A cannot see Tenant B’s data. Three tenancy models: (1) Silo: each tenant gets their own dedicated infrastructure (database, compute). Maximum isolation, highest cost, easiest compliance. Used by enterprise customers with strict security requirements. (2) Bridge: shared compute but separate databases per tenant. Good isolation, moderate cost. (3) Pool: all tenants share the same database and compute. Lowest cost, highest complexity to maintain isolation. Most SaaS startups start here.

Database Schema Approaches

Schema-per-tenant (PostgreSQL schemas): each tenant has a dedicated schema (namespace) within the same database cluster. Tables are identical but isolated. Query: SET search_path = tenant_123; SELECT * FROM orders. Pros: strong isolation, easy to add tenant-specific customizations. Cons: schema migrations must run per tenant (1000 tenants = 1000 migration runs). Hard to query across tenants. Shared schema with tenant_id column: all tenants share the same tables; every row has a tenant_id column. Every query includes WHERE tenant_id = :current_tenant. Index every table on (tenant_id, primary_key). Pros: simple migrations, easy cross-tenant analytics. Cons: tenant isolation is enforced in application code only (a bug can leak data across tenants). Row-level security (PostgreSQL RLS): CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting(‘app.tenant_id’)::uuid). Enforces isolation at the DB level, removing the application layer risk.

Tenant Routing and Context

Every request must identify the tenant. Methods: (1) Subdomain: company.app.com → extract “company” from the hostname, look up the tenant_id. (2) Custom domain: customers.app.com → a domain_mappings table maps the domain to a tenant_id. (3) JWT claim: token contains tenant_id claim. (4) URL prefix: /api/v1/{tenant_id}/orders. The application sets a request-scoped tenant context on every incoming request. All DB queries, cache keys, and external API calls are namespaced by tenant_id. Never use a global cache key without the tenant prefix — data from one tenant must never serve another.

Tenant-Level Configuration

Tenants need customization: logo and brand colors, enabled features (feature flags per tenant), data retention policies (enterprise: 7 years, SMB: 1 year), SSO configuration (SAML, OAuth provider), custom webhook endpoints. Store tenant configuration in a TenantConfig table with a JSON config column for flexibility. Cache the config in Redis (tenant_config:{tenant_id}) with a 5-minute TTL. Feature flags: use a Feature table with (tenant_id, feature_name, enabled). The application checks feature flags to enable/disable functionality per tenant. This allows gradual rollouts: enable a new feature for 10% of tenants, expand if no issues.

Per-Tenant Rate Limiting and Quotas

Prevent one tenant from monopolizing shared resources (noisy neighbor problem). Quotas: API requests per minute (free: 100, pro: 1000, enterprise: unlimited). Storage: GB of data stored. Users: number of user seats. Enforcement: check the tenant’s quota before processing each request. Track usage in Redis: INCR tenant:{id}:api_calls:{minute} with EX 60. Compare to the quota in TenantConfig. Return HTTP 429 with Retry-After header when exceeded. Storage tracking: a nightly job counts data per tenant and stores it in the TenantConfig. Alert when a tenant reaches 80% of quota. For enterprise tenants with high quotas: separate dedicated infrastructure (separate DB connection pool, dedicated cache prefix, priority Kafka partitions) to guarantee SLA regardless of other tenants’ load.

Tenant Onboarding and Offboarding

Onboarding: create a Tenant record, provision TenantConfig with default settings, create the admin user, set up the billing subscription, run a tenant-specific DB migration (for schema-per-tenant model), send a welcome email with setup instructions. Idempotent: the onboarding job can be retried without creating duplicate data (check if the tenant already exists before each step). Offboarding (tenant cancellation): export all tenant data (GDPR data portability requirement). Set tenant status to CANCELLING. Schedule data deletion after 30 days (in case the tenant reactivates). After 30 days: delete all rows with this tenant_id across all tables. For schema-per-tenant: DROP SCHEMA tenant_123 CASCADE. Audit log: retain for 7 years even after account deletion (legal requirement in many jurisdictions).

Asked at: Stripe Interview Guide

Asked at: Shopify Interview Guide

Asked at: Atlassian Interview Guide

Asked at: Airbnb Interview Guide

Scroll to Top