Core Entities
Tenant: tenant_id, name, slug, plan, custom_domain, storage_quota_gb, created_at. (Multi-tenant: each customer organization is a tenant with isolated content.) ContentType: type_id, tenant_id, name (e.g., “BlogPost”, “ProductPage”), schema (JSON Schema defining fields: title, body, author, tags, custom fields), is_singleton (for one-off pages like About Us). Entry: entry_id, tenant_id, content_type_id, status (DRAFT, IN_REVIEW, SCHEDULED, PUBLISHED, ARCHIVED), published_version_id, created_by, created_at, updated_at, scheduled_publish_at. EntryVersion: version_id, entry_id, version_number, data (JSONB — the field values for this version), created_by, created_at, publish_note. Asset: asset_id, tenant_id, filename, storage_key, mime_type, size_bytes, alt_text, metadata (JSONB), uploaded_by, created_at. Webhook: webhook_id, tenant_id, url, events (array: entry.published, entry.unpublished, asset.created), secret_key, is_active.
Publishing Workflow State Machine
class EntryService:
TRANSITIONS = {
"DRAFT": ["IN_REVIEW", "SCHEDULED", "PUBLISHED", "ARCHIVED"],
"IN_REVIEW": ["DRAFT", "SCHEDULED", "PUBLISHED", "ARCHIVED"],
"SCHEDULED": ["DRAFT", "PUBLISHED", "ARCHIVED"],
"PUBLISHED": ["DRAFT", "ARCHIVED"],
"ARCHIVED": ["DRAFT"],
}
def publish(self, entry_id: int, version_id: int,
actor_id: int) -> Entry:
with db.transaction():
entry = db.query(
"SELECT * FROM entries WHERE entry_id = %s FOR UPDATE",
entry_id
)
if "PUBLISHED" not in self.TRANSITIONS[entry.status]:
raise InvalidTransition(entry.status, "PUBLISHED")
db.execute(
"UPDATE entries SET status = 'PUBLISHED', "
"published_version_id = %s, "
"updated_at = NOW() WHERE entry_id = %s",
version_id, entry_id
)
# Invalidate CDN cache for this content
self.cdn.purge(entry.tenant_id, entry_id)
# Fire webhooks asynchronously
self.webhook_dispatcher.dispatch_async(
tenant_id=entry.tenant_id,
event="entry.published",
payload={"entry_id": entry_id, "version_id": version_id}
)
return db.get_entry(entry_id)
def schedule(self, entry_id: int, publish_at: datetime,
actor_id: int) -> Entry:
with db.transaction():
entry = db.query(
"SELECT * FROM entries WHERE entry_id = %s FOR UPDATE",
entry_id
)
db.execute(
"UPDATE entries SET status = 'SCHEDULED', "
"scheduled_publish_at = %s WHERE entry_id = %s",
publish_at, entry_id
)
# A background scheduler polls for SCHEDULED entries
# WHERE scheduled_publish_at <= NOW() and publishes them
return db.get_entry(entry_id)
Versioning and Draft Management
Every save creates a new EntryVersion. The Entry record points to the currently published version (published_version_id) and tracks the latest draft implicitly (the highest version_number with status=DRAFT). Version comparison: diff any two versions by comparing their JSONB data fields. Restore a previous version: create a new EntryVersion with the old version’s data (do not modify historical versions). Auto-save: save a draft version every 30 seconds while the editor is active (debounced). Distinguish auto-saves from manual saves with a is_autosave flag. Conflict detection: when two users edit simultaneously, use optimistic locking — include the base_version_id in save requests. If the current latest version differs from base_version_id, return a conflict error. The editor shows a diff and asks the user to resolve.
Content Delivery API and Multi-Tenancy
Content Delivery API (headless CMS mode): GET /api/{tenant_slug}/entries?content_type=BlogPost&status=PUBLISHED. Returns the published version’s data as JSON. Optimized for reads: cache published entry data in Redis (TTL = 5 minutes; invalidate on publish/unpublish). CDN caching: published content is served via CDN (Cloudflare) with Cache-Control: max-age=3600. On publish: CDN purge for the affected URLs. Multi-tenancy isolation: every database query includes tenant_id in the WHERE clause (enforced by a middleware layer — no query is allowed without tenant scope). Row-level security in PostgreSQL as defense-in-depth: CREATE POLICY tenant_isolation ON entries USING (tenant_id = current_setting(‘app.tenant_id’)). Asset storage: each tenant gets a scoped S3 prefix (s3://cms-assets/{tenant_id}/). Quota enforcement: check used_storage <= storage_quota_gb on each asset upload; reject with 402 if over quota.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does multi-tenancy isolation work in a CMS database?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Multi-tenancy isolation: every tenant's data must be completely invisible to other tenants. Two approaches: (1) Shared database with tenant_id column: every table has tenant_id. Every query includes WHERE tenant_id = current_tenant_id, enforced by application middleware. More scalable (fewer databases), but a missing WHERE clause can leak data. Defense-in-depth: PostgreSQL Row-Level Security (RLS) policy: CREATE POLICY tenant_isolation ON entries USING (tenant_id = current_setting('app.tenant_id')). Even if the application forgets the WHERE clause, the database policy enforces isolation. (2) Separate database per tenant: complete isolation (query mistakes cannot leak data), but harder to operate (thousands of databases). Typically used for enterprise/regulated customers. Most SaaS CMSes use option 1 for standard tenants and option 2 for enterprise.”}},{“@type”:”Question”,”name”:”How does scheduled publishing work in a CMS?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Scheduled publishing: an entry with status=SCHEDULED and scheduled_publish_at set should automatically transition to PUBLISHED at the scheduled time. Implementation: a background scheduler runs every minute (cron job or Celery Beat). It queries: SELECT * FROM entries WHERE status = 'SCHEDULED' AND scheduled_publish_at <= NOW(). For each result: run the publish() workflow (transition to PUBLISHED, invalidate CDN, fire webhooks). Idempotency: if the scheduler runs twice in a minute (due to a crash and restart), the second run re-processes already-published entries. Guard: publish() checks that status is still SCHEDULED before acting (with FOR UPDATE lock) — if already PUBLISHED, it is a no-op. Timezone handling: store scheduled_publish_at in UTC in the database; display in the editor in the user's local timezone.”}},{“@type”:”Question”,”name”:”How do webhooks work in a CMS and how do you guarantee delivery?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Webhooks notify external systems of CMS events (entry.published, asset.uploaded) via HTTP POST to a configured URL. Reliable delivery challenges: the target URL may be temporarily down. At-least-once delivery: store webhook deliveries in a webhook_deliveries table (webhook_id, event_id, status, attempt_count, next_retry_at, response_code). A dispatcher worker polls for pending deliveries, makes the HTTP POST, and updates status to DELIVERED (2xx) or FAILED. Retry with exponential backoff: retry after 1min, 5min, 30min, 2hr, 24hr. After 5 failures: mark as DEAD_LETTERED and alert the developer. HMAC signature: include an X-Webhook-Signature header (HMAC-SHA256 of the payload using the webhook's secret_key) so consumers can verify the payload was sent by the CMS and not tampered with.”}},{“@type”:”Question”,”name”:”How does a headless CMS deliver content and why is caching critical?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Headless CMS separates content management (backend) from content presentation (frontend). The delivery API returns structured JSON content that any frontend (website, mobile app, IoT device) consumes. Caching strategy: published content is immutable (until next publish). Cache aggressively: (1) Application layer cache: Redis with TTL=5min for published entry data. (2) CDN cache: Cloudflare/CloudFront with Cache-Control: max-age=3600. Each entry has a canonical URL: /api/{tenant}/entries/{entry_id}. On publish: purge both the Redis key and the CDN cache for that URL. Cache-busting with ETag: include an ETag (hash of the published version content) in the response. Clients send If-None-Match; server returns 304 Not Modified if unchanged, saving bandwidth. At scale: CDN cache handles 95%+ of reads, database is rarely hit.”}},{“@type”:”Question”,”name”:”How do you handle rich text content and structured data fields in a CMS schema?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”CMS content types define fields with types: short text (VARCHAR), long text (TEXT), rich text (structured document: paragraph, heading, image, code block — stored as a document tree in JSON, rendered by the frontend). Number, boolean, date, asset reference (foreign key to assets table), entry reference (foreign key to entries for linked content), and array fields. Storage: field data is stored as a JSONB column on EntryVersion (flexible — no schema migration needed when fields are added). Validation: on each save, validate field data against the ContentType's JSON Schema (field names, required fields, type constraints). Rich text: store as a document AST (Abstract Syntax Tree) in JSON — vendor-neutral format that any rich text editor (Slate, ProseMirror, Tiptap) can consume. Avoid storing HTML directly: HTML is renderer-specific and cannot be rendered differently for different platforms.”}}]}
See also: Atlassian Interview Prep
See also: Shopify Interview Prep
See also: LinkedIn Interview Prep