Core Entities
ContentType: type_id, name (Article, Page, Product), fields_schema (JSON schema defining the fields). Content: content_id, type_id, slug, status (DRAFT, REVIEW, SCHEDULED, PUBLISHED, ARCHIVED), author_id, created_at, published_at, scheduled_at. ContentVersion: version_id, content_id, version_number, field_values (JSON), changed_by, created_at. User: user_id, role (ADMIN, EDITOR, AUTHOR, VIEWER). Media: media_id, file_url, file_type, file_size_bytes, alt_text, uploaded_by. Workflow: workflow_id, content_id, current_step, assigned_to, due_date.
Content Versioning
Every save creates a new ContentVersion record. The current published version is referenced by Content.published_version_id. Versioning schema: content_id + version_number (auto-increment per content). Store field_values as JSON to accommodate different content types with different fields. Diff computation: on each version save, compute a diff vs the previous version for the changelog display (use JSON diff libraries). Version restore: create a new version with the field_values of the target old version — never overwrite existing versions (immutable history). Version pruning: keep the last N drafts + all published versions. Prune intermediate drafts older than 90 days to manage storage. Auto-save: save a draft version every 60 seconds while the editor is active, preventing work loss without cluttering the version history.
Role-Based Permissions
AUTHOR: create drafts, edit own content, submit for review. EDITOR: review and approve any content, publish, schedule, archive. ADMIN: all editor permissions + manage users, create content types, configure workflows. VIEWER: read-only access to published content. Permission checks: every API endpoint validates the user role before executing. Content ownership: authors can only edit their own content (unless EDITOR/ADMIN). Row-level permission for multi-tenant CMS: add organization_id to all entities and enforce tenant isolation in every query. Approval workflow: AUTHOR submits for review (status=REVIEW). EDITOR receives a notification. EDITOR approves (status=PUBLISHED) or rejects with comments (status=DRAFT). Track workflow history in an audit log.
Rich Text Storage
Store rich text content as structured JSON (not raw HTML). Formats: ProseMirror document model (used by Notion, Linear), Slate.js, or Lexical (Meta). Example: {type: “doc”, content: [{type: “paragraph”, content: [{type: “text”, text: “Hello”, marks: [{type: “bold”}]}]}]}. Benefits: programmatic manipulation without HTML parsing, version diffing is meaningful (can diff at the node level), safe rendering (no XSS if the JSON is rendered by a trusted renderer), supports collaborative editing (CRDT-based merging). Convert to HTML for delivery to the frontend or email. Store raw HTML as a computed field (cached on publish) for performance. Never store user-provided HTML directly without sanitization (XSS risk).
Publishing Workflow and Scheduling
Scheduled publishing: Content.scheduled_at = a future datetime. A background job runs every minute, queries for SCHEDULED content where scheduled_at <= NOW(), sets status=PUBLISHED, published_at=NOW(). Atomically: clear scheduled_at, set published_version_id to the current draft version. Publish to CDN: on publish, invalidate the CDN cache for the content URL (purge the old version). Write the new content to a static file (for static site generators) or update the database. Multi-site publishing: one content item may be published to multiple sites (different domains, languages). Site-specific overrides: a content variant per site with different field values. Publish triggers: webhook or Kafka event notifying downstream systems (CDN invalidation, search index update, email notification to subscribers).
Media Management
Media upload flow: client requests a presigned S3 URL from the API. Client uploads directly to S3 (bypasses application servers). On upload completion: S3 fires a completion webhook. API creates a Media record (file_url, file_type, dimensions for images). Image processing: on upload, trigger a Lambda function to generate multiple sizes (thumbnail 200×200, medium 800px width, large 1600px width) using ImageMagick or Sharp. Store all sizes in S3 with deterministic paths (media_id/800w.jpg). Serve images via CDN. Video: trigger a transcoding job (similar to video processing pipeline). Media search: Elasticsearch index on alt_text, tags, file_type for the media library search. Unused media cleanup: periodically identify media not referenced by any content version and flag for deletion (with a review step — do not auto-delete).
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does content versioning work in a CMS?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Content versioning stores every save as an immutable snapshot. Schema: content table (the current metadata) + content_versions table (field_values JSON per version). On each save: INSERT a new content_version row with the current field values and increment the version_number. Never update or delete existing versions — they are the immutable audit trail. The current published version is referenced by content.published_version_id. Restoring an old version: create a new version with the same field_values as the target old version (new row, new version_number). This preserves the restore event in the history. Auto-save: save a draft version every 60 seconds while the editor is typing, but only show named (user-triggered) saves in the version history UI to avoid cluttering it with hundreds of auto-save entries.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement a draft-to-publish workflow with approvals?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Workflow states: DRAFT -> IN_REVIEW -> APPROVED -> PUBLISHED (or REJECTED -> DRAFT). Role actions: AUTHOR creates a draft, submits for review (transition to IN_REVIEW, assigns to an EDITOR). EDITOR reviews: can approve (transition to APPROVED) or reject with comments (transition back to DRAFT). ADMIN or EDITOR publishes the approved content. Notifications: on state transition, notify the relevant user (EDITOR notified when review is requested; AUTHOR notified when approved or rejected). Workflow history: store each transition (from_state, to_state, actor_id, comment, timestamp) for audit purposes. Multi-step workflows: some enterprises require two approvals (junior editor + senior editor). Model as a workflow_steps table with ordered steps and separate approval tracking per step.”
}
},
{
“@type”: “Question”,
“name”: “How do you store and render rich text content safely?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Never store user-provided HTML directly — XSS risk if the HTML is rendered without sanitization. Instead: (1) Store content as structured JSON (ProseMirror, Slate, or Lexical document model). The JSON represents the document tree (nodes: paragraph, heading, bold text, links). (2) On rendering: convert the JSON to HTML using a trusted renderer (server-side or client-side library). The renderer never allows arbitrary HTML injection — it only renders the node types defined in the schema. (3) If raw HTML must be accepted (legacy data): sanitize with a library (DOMPurify for client-side, html-sanitizer for server-side) before storage. Allowlist: permit only safe tags and attributes (p, h1-h6, strong, em, a[href], img[src, alt]). Strip: script, iframe, on* event handlers, javascript: URLs. Storing structured JSON gives the best security and extensibility.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement scheduled content publishing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Scheduled publishing sets a future publish time. On schedule: set content.status = SCHEDULED, content.scheduled_at = future_datetime. A scheduler job (cron or Airflow DAG) runs every minute: SELECT content WHERE status=’SCHEDULED’ AND scheduled_at <= NOW(). For each result: (1) Set status = PUBLISHED, published_at = NOW(). (2) Set published_version_id to the current draft version_id. (3) Trigger CDN cache invalidation for the content URL. (4) Fire a Kafka event (ContentPublished) for downstream consumers (search index, email newsletter). (5) Clear scheduled_at. Idempotency: use a database lock or unique constraint to prevent two scheduler runs from publishing the same content twice if they overlap. Handle time zones: store scheduled_at in UTC; convert to the user timezone only for display."
}
},
{
"@type": "Question",
"name": "How do you handle multi-site or multi-language publishing?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Multi-site: one CMS instance serves multiple sites (e.g., US, UK, EU brands). Content may be shared or site-specific. Model: site table + content_site_mapping (content_id, site_id, site_specific_overrides JSON). On publish: specify which sites to publish to. Each site gets its own CDN path (/us/, /uk/). Shared content: base content in the main content record. Site-specific overrides stored in content_site_mapping. Multi-language (i18n): content has a default_locale and translations. Translation table: (content_id, locale, field_values JSON). Example: content_id=42 has an English version (field_values) and a French translation (field_values with French text). Translation workflow: content is created in the default locale, sent to translators (via a translation management system integration or manual assignment), approved translations are published per locale. URL structure: /en/article-slug, /fr/article-slug."
}
}
]
}
Asked at: Atlassian Interview Guide
Asked at: Shopify Interview Guide
Asked at: Airbnb Interview Guide
Asked at: Stripe Interview Guide