Q: How do webhooks work in a CMS and how do you guarantee delivery?

Webhooks notify external systems of CMS events (entry.published, asset.uploaded) via HTTP POST to a configured URL. Reliable delivery challenges: the target URL may be temporarily down. At-least-once delivery: store webhook deliveries in a webhook_deliveries table (webhook_id, event_id, status, attempt_count, next_retry_at, response_code). A dispatcher worker polls for pending deliveries, makes the HTTP POST, and updates status to DELIVERED (2xx) or FAILED. Retry with exponential backoff: retry after 1min, 5min, 30min, 2hr, 24hr. After 5 failures: mark as DEAD_LETTERED and alert the developer. HMAC signature: include an X-Webhook-Signature header (HMAC-SHA256 of the payload using the webhook's secret_key) so consumers can verify the payload was sent by the CMS and not tampered with.

Q: How does a headless CMS deliver content and why is caching critical?

Headless CMS separates content management (backend) from content presentation (frontend). The delivery API returns structured JSON content that any frontend (website, mobile app, IoT device) consumes. Caching strategy: published content is immutable (until next publish). Cache aggressively: (1) Application layer cache: Redis with TTL=5min for published entry data. (2) CDN cache: Cloudflare/CloudFront with Cache-Control: max-age=3600. Each entry has a canonical URL: /api/{tenant}/entries/{entry_id}. On publish: purge both the Redis key and the CDN cache for that URL. Cache-busting with ETag: include an ETag (hash of the published version content) in the response. Clients send If-None-Match; server returns 304 Not Modified if unchanged, saving bandwidth. At scale: CDN cache handles 95%+ of reads, database is rarely hit.

Q: How do you handle rich text content and structured data fields in a CMS schema?

CMS content types define fields with types: short text (VARCHAR), long text (TEXT), rich text (structured document: paragraph, heading, image, code block — stored as a document tree in JSON, rendered by the frontend). Number, boolean, date, asset reference (foreign key to assets table), entry reference (foreign key to entries for linked content), and array fields. Storage: field data is stored as a JSONB column on EntryVersion (flexible — no schema migration needed when fields are added). Validation: on each save, validate field data against the ContentType's JSON Schema (field names, required fields, type constraints). Rich text: store as a document AST (Abstract Syntax Tree) in JSON — vendor-neutral format that any rich text editor (Slate, ProseMirror, Tiptap) can consume. Avoid storing HTML directly: HTML is renderer-specific and cannot be rendered differently for different platforms.

Question 1

How does multi-tenancy isolation work in a CMS database?

Accepted Answer

Multi-tenancy isolation: every tenant's data must be completely invisible to other tenants. Two approaches: (1) Shared database with tenant_id column: every table has tenant_id. Every query includes WHERE tenant_id = current_tenant_id, enforced by application middleware. More scalable (fewer databases), but a missing WHERE clause can leak data. Defense-in-depth: PostgreSQL Row-Level Security (RLS) policy: CREATE POLICY tenant_isolation ON entries USING (tenant_id = current_setting('app.tenant_id')). Even if the application forgets the WHERE clause, the database policy enforces isolation. (2) Separate database per tenant: complete isolation (query mistakes cannot leak data), but harder to operate (thousands of databases). Typically used for enterprise/regulated customers. Most SaaS CMSes use option 1 for standard tenants and option 2 for enterprise.

Question 2

How does scheduled publishing work in a CMS?

Accepted Answer

Scheduled publishing: an entry with status=SCHEDULED and scheduled_publish_at set should automatically transition to PUBLISHED at the scheduled time. Implementation: a background scheduler runs every minute (cron job or Celery Beat). It queries: SELECT * FROM entries WHERE status = 'SCHEDULED' AND scheduled_publish_at <= NOW(). For each result: run the publish() workflow (transition to PUBLISHED, invalidate CDN, fire webhooks). Idempotency: if the scheduler runs twice in a minute (due to a crash and restart), the second run re-processes already-published entries. Guard: publish() checks that status is still SCHEDULED before acting (with FOR UPDATE lock) — if already PUBLISHED, it is a no-op. Timezone handling: store scheduled_publish_at in UTC in the database; display in the editor in the user's local timezone.

Question 3

How do webhooks work in a CMS and how do you guarantee delivery?

Accepted Answer

Webhooks notify external systems of CMS events (entry.published, asset.uploaded) via HTTP POST to a configured URL. Reliable delivery challenges: the target URL may be temporarily down. At-least-once delivery: store webhook deliveries in a webhook_deliveries table (webhook_id, event_id, status, attempt_count, next_retry_at, response_code). A dispatcher worker polls for pending deliveries, makes the HTTP POST, and updates status to DELIVERED (2xx) or FAILED. Retry with exponential backoff: retry after 1min, 5min, 30min, 2hr, 24hr. After 5 failures: mark as DEAD_LETTERED and alert the developer. HMAC signature: include an X-Webhook-Signature header (HMAC-SHA256 of the payload using the webhook's secret_key) so consumers can verify the payload was sent by the CMS and not tampered with.

Question 4

How does a headless CMS deliver content and why is caching critical?

Accepted Answer

Headless CMS separates content management (backend) from content presentation (frontend). The delivery API returns structured JSON content that any frontend (website, mobile app, IoT device) consumes. Caching strategy: published content is immutable (until next publish). Cache aggressively: (1) Application layer cache: Redis with TTL=5min for published entry data. (2) CDN cache: Cloudflare/CloudFront with Cache-Control: max-age=3600. Each entry has a canonical URL: /api/{tenant}/entries/{entry_id}. On publish: purge both the Redis key and the CDN cache for that URL. Cache-busting with ETag: include an ETag (hash of the published version content) in the response. Clients send If-None-Match; server returns 304 Not Modified if unchanged, saving bandwidth. At scale: CDN cache handles 95%+ of reads, database is rarely hit.

Question 5

How do you handle rich text content and structured data fields in a CMS schema?

Accepted Answer

CMS content types define fields with types: short text (VARCHAR), long text (TEXT), rich text (structured document: paragraph, heading, image, code block — stored as a document tree in JSON, rendered by the frontend). Number, boolean, date, asset reference (foreign key to assets table), entry reference (foreign key to entries for linked content), and array fields. Storage: field data is stored as a JSONB column on EntryVersion (flexible — no schema migration needed when fields are added). Validation: on each save, validate field data against the ContentType's JSON Schema (field names, required fields, type constraints). Rich text: store as a document AST (Abstract Syntax Tree) in JSON — vendor-neutral format that any rich text editor (Slate, ProseMirror, Tiptap) can consume. Avoid storing HTML directly: HTML is renderer-specific and cannot be rendered differently for different platforms.

Low-Level Design: Content Management System — Publishing Workflow, Versioning, and Multi-Tenant

Core Entities

Publishing Workflow State Machine

Versioning and Draft Management

Content Delivery API and Multi-Tenancy