What is Multi-Tenancy
A multi-tenant SaaS serves multiple customers (tenants) from the same shared infrastructure. Each tenant’s data must be isolated: Tenant A cannot see Tenant B’s data. Three tenancy models: (1) Silo: each tenant gets their own dedicated infrastructure (database, compute). Maximum isolation, highest cost, easiest compliance. Used by enterprise customers with strict security requirements. (2) Bridge: shared compute but separate databases per tenant. Good isolation, moderate cost. (3) Pool: all tenants share the same database and compute. Lowest cost, highest complexity to maintain isolation. Most SaaS startups start here.
Database Schema Approaches
Schema-per-tenant (PostgreSQL schemas): each tenant has a dedicated schema (namespace) within the same database cluster. Tables are identical but isolated. Query: SET search_path = tenant_123; SELECT * FROM orders. Pros: strong isolation, easy to add tenant-specific customizations. Cons: schema migrations must run per tenant (1000 tenants = 1000 migration runs). Hard to query across tenants. Shared schema with tenant_id column: all tenants share the same tables; every row has a tenant_id column. Every query includes WHERE tenant_id = :current_tenant. Index every table on (tenant_id, primary_key). Pros: simple migrations, easy cross-tenant analytics. Cons: tenant isolation is enforced in application code only (a bug can leak data across tenants). Row-level security (PostgreSQL RLS): CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting(‘app.tenant_id’)::uuid). Enforces isolation at the DB level, removing the application layer risk.
Tenant Routing and Context
Every request must identify the tenant. Methods: (1) Subdomain: company.app.com → extract “company” from the hostname, look up the tenant_id. (2) Custom domain: customers.app.com → a domain_mappings table maps the domain to a tenant_id. (3) JWT claim: token contains tenant_id claim. (4) URL prefix: /api/v1/{tenant_id}/orders. The application sets a request-scoped tenant context on every incoming request. All DB queries, cache keys, and external API calls are namespaced by tenant_id. Never use a global cache key without the tenant prefix — data from one tenant must never serve another.
Tenant-Level Configuration
Tenants need customization: logo and brand colors, enabled features (feature flags per tenant), data retention policies (enterprise: 7 years, SMB: 1 year), SSO configuration (SAML, OAuth provider), custom webhook endpoints. Store tenant configuration in a TenantConfig table with a JSON config column for flexibility. Cache the config in Redis (tenant_config:{tenant_id}) with a 5-minute TTL. Feature flags: use a Feature table with (tenant_id, feature_name, enabled). The application checks feature flags to enable/disable functionality per tenant. This allows gradual rollouts: enable a new feature for 10% of tenants, expand if no issues.
Per-Tenant Rate Limiting and Quotas
Prevent one tenant from monopolizing shared resources (noisy neighbor problem). Quotas: API requests per minute (free: 100, pro: 1000, enterprise: unlimited). Storage: GB of data stored. Users: number of user seats. Enforcement: check the tenant’s quota before processing each request. Track usage in Redis: INCR tenant:{id}:api_calls:{minute} with EX 60. Compare to the quota in TenantConfig. Return HTTP 429 with Retry-After header when exceeded. Storage tracking: a nightly job counts data per tenant and stores it in the TenantConfig. Alert when a tenant reaches 80% of quota. For enterprise tenants with high quotas: separate dedicated infrastructure (separate DB connection pool, dedicated cache prefix, priority Kafka partitions) to guarantee SLA regardless of other tenants’ load.
Tenant Onboarding and Offboarding
Onboarding: create a Tenant record, provision TenantConfig with default settings, create the admin user, set up the billing subscription, run a tenant-specific DB migration (for schema-per-tenant model), send a welcome email with setup instructions. Idempotent: the onboarding job can be retried without creating duplicate data (check if the tenant already exists before each step). Offboarding (tenant cancellation): export all tenant data (GDPR data portability requirement). Set tenant status to CANCELLING. Schedule data deletion after 30 days (in case the tenant reactivates). After 30 days: delete all rows with this tenant_id across all tables. For schema-per-tenant: DROP SCHEMA tenant_123 CASCADE. Audit log: retain for 7 years even after account deletion (legal requirement in many jurisdictions).
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What are the trade-offs between schema-per-tenant and shared-schema multi-tenancy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Schema-per-tenant: each tenant has its own set of tables in a separate schema (PostgreSQL) or separate database. Pros: strong isolation (a bug cannot leak cross-tenant data), easy to add tenant-specific columns, straightforward GDPR deletion (drop schema), simple to move a tenant to dedicated infrastructure. Cons: migrations must run against every tenant schema — 1000 tenants means 1000 migration runs (slow, error-prone); hard to query across tenants for analytics; connection pooling complexity (must connect to the right schema per request). Shared-schema: all tenants share tables, every row has tenant_id. Pros: single migration run, easy cross-tenant analytics, simpler connection management. Cons: application-level isolation (a missing WHERE clause leaks data); indexes must include tenant_id; one large tenant can degrade performance for others (noisy neighbor). Most startups start shared-schema and migrate selected enterprise customers to dedicated databases as needed.”
}
},
{
“@type”: “Question”,
“name”: “How does PostgreSQL Row-Level Security enforce tenant isolation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “RLS adds a security policy at the database level that automatically filters rows based on a session variable. Example: SET app.tenant_id = ‘abc-123’; then any SELECT/UPDATE/DELETE on the orders table automatically adds WHERE tenant_id = ‘abc-123’. The policy is defined once: CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting(‘app.tenant_id’)::uuid). Enable with ALTER TABLE orders ENABLE ROW LEVEL SECURITY. Every query is then automatically scoped to the current tenant — even if the application forgets the WHERE clause. Benefits: defense in depth (isolation holds even with application bugs), auditable (policy lives in the database schema), works across all ORM queries transparently. Considerations: the session variable must be set on every connection before queries; connection pooling requires resetting the variable between tenants (PgBouncer transaction-mode does this automatically).”
}
},
{
“@type”: “Question”,
“name”: “How do you implement per-tenant rate limiting without a global lock?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use Redis with atomic increment operations. For each API request: INCR tenant:{tenant_id}:api:{minute} — atomically increment the counter for the current minute. EXPIRE tenant:{tenant_id}:api:{minute} 120 — keep the key for 2 minutes (allows reading the previous minute for sliding window). Compare the result to the tenant’s quota from TenantConfig. If exceeded, return HTTP 429 with Retry-After: {seconds_until_next_minute}. This is O(1) per request with no locks. For sliding window instead of fixed window: use a Redis sorted set with ZADD (score=timestamp, member=request_id) and ZCOUNT to count requests in the last 60 seconds. More accurate but uses more memory per tenant. For burst allowance: token bucket implemented with INCR + TTL on a tokens-remaining key, refilled by a background job.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle database migrations in a multi-tenant SaaS?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “For shared-schema: run migrations once against the shared tables — all tenants see the change simultaneously. Use backwards-compatible migrations (add nullable columns, never drop columns immediately). Deployment sequence: deploy new code with feature flags disabled u2192 run migration u2192 enable feature flags u2192 old code still works with new schema. For schema-per-tenant: migrations must run against each tenant’s schema. Use a migration runner that iterates all active tenants and applies pending migrations. Run in parallel with a worker pool (10-20 concurrent migrations) to avoid taking hours for 1000+ tenants. Track migration state per tenant in a schema_migrations table in each tenant schema. Handle failures: if migration fails for tenant X, log and continue with other tenants. Failed tenants remain on the old schema; retry with a separate job. Never block new deployments on a partial migration failure.”
}
},
{
“@type”: “Question”,
“name”: “How do you isolate a noisy neighbor tenant in a shared-infrastructure SaaS?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Detection: monitor per-tenant resource consumption — queries per second, query latency, rows scanned, storage I/O. Alert when a tenant consumes >10% of cluster resources. Mitigation options (in escalating order): (1) Rate limit the tenant’s API calls (HTTP 429 with clear messaging). (2) Query throttling: add a pg_sleep or similar delay to the noisy tenant’s queries in the application layer. (3) Move the tenant to a dedicated database connection pool with limited connections. (4) Migrate the tenant to their own dedicated database instance (tenant tier upgrade). For the last option: take a logical backup, restore to a new instance, update the tenant routing table to point to the new instance, run in parallel briefly to verify, switch traffic. All this should be doable without tenant-visible downtime using logical replication.”
}
}
]
}
Asked at: Stripe Interview Guide
Asked at: Shopify Interview Guide
Asked at: Atlassian Interview Guide
Asked at: Airbnb Interview Guide