Low-Level Design: Content Management System — Drafts, Versioning, Roles, and Publishing Workflow

Core Entities

ContentType: type_id, name (Article, Page, Product), fields_schema (JSON schema defining the fields). Content: content_id, type_id, slug, status (DRAFT, REVIEW, SCHEDULED, PUBLISHED, ARCHIVED), author_id, created_at, published_at, scheduled_at. ContentVersion: version_id, content_id, version_number, field_values (JSON), changed_by, created_at. User: user_id, role (ADMIN, EDITOR, AUTHOR, VIEWER). Media: media_id, file_url, file_type, file_size_bytes, alt_text, uploaded_by. Workflow: workflow_id, content_id, current_step, assigned_to, due_date.

Content Versioning

Every save creates a new ContentVersion record. The current published version is referenced by Content.published_version_id. Versioning schema: content_id + version_number (auto-increment per content). Store field_values as JSON to accommodate different content types with different fields. Diff computation: on each version save, compute a diff vs the previous version for the changelog display (use JSON diff libraries). Version restore: create a new version with the field_values of the target old version — never overwrite existing versions (immutable history). Version pruning: keep the last N drafts + all published versions. Prune intermediate drafts older than 90 days to manage storage. Auto-save: save a draft version every 60 seconds while the editor is active, preventing work loss without cluttering the version history.

Role-Based Permissions

AUTHOR: create drafts, edit own content, submit for review. EDITOR: review and approve any content, publish, schedule, archive. ADMIN: all editor permissions + manage users, create content types, configure workflows. VIEWER: read-only access to published content. Permission checks: every API endpoint validates the user role before executing. Content ownership: authors can only edit their own content (unless EDITOR/ADMIN). Row-level permission for multi-tenant CMS: add organization_id to all entities and enforce tenant isolation in every query. Approval workflow: AUTHOR submits for review (status=REVIEW). EDITOR receives a notification. EDITOR approves (status=PUBLISHED) or rejects with comments (status=DRAFT). Track workflow history in an audit log.

Rich Text Storage

Store rich text content as structured JSON (not raw HTML). Formats: ProseMirror document model (used by Notion, Linear), Slate.js, or Lexical (Meta). Example: {type: “doc”, content: [{type: “paragraph”, content: [{type: “text”, text: “Hello”, marks: [{type: “bold”}]}]}]}. Benefits: programmatic manipulation without HTML parsing, version diffing is meaningful (can diff at the node level), safe rendering (no XSS if the JSON is rendered by a trusted renderer), supports collaborative editing (CRDT-based merging). Convert to HTML for delivery to the frontend or email. Store raw HTML as a computed field (cached on publish) for performance. Never store user-provided HTML directly without sanitization (XSS risk).

Publishing Workflow and Scheduling

Scheduled publishing: Content.scheduled_at = a future datetime. A background job runs every minute, queries for SCHEDULED content where scheduled_at <= NOW(), sets status=PUBLISHED, published_at=NOW(). Atomically: clear scheduled_at, set published_version_id to the current draft version. Publish to CDN: on publish, invalidate the CDN cache for the content URL (purge the old version). Write the new content to a static file (for static site generators) or update the database. Multi-site publishing: one content item may be published to multiple sites (different domains, languages). Site-specific overrides: a content variant per site with different field values. Publish triggers: webhook or Kafka event notifying downstream systems (CDN invalidation, search index update, email notification to subscribers).

Media Management

Media upload flow: client requests a presigned S3 URL from the API. Client uploads directly to S3 (bypasses application servers). On upload completion: S3 fires a completion webhook. API creates a Media record (file_url, file_type, dimensions for images). Image processing: on upload, trigger a Lambda function to generate multiple sizes (thumbnail 200×200, medium 800px width, large 1600px width) using ImageMagick or Sharp. Store all sizes in S3 with deterministic paths (media_id/800w.jpg). Serve images via CDN. Video: trigger a transcoding job (similar to video processing pipeline). Media search: Elasticsearch index on alt_text, tags, file_type for the media library search. Unused media cleanup: periodically identify media not referenced by any content version and flag for deletion (with a review step — do not auto-delete).

Asked at: Atlassian Interview Guide

Asked at: Shopify Interview Guide

Asked at: Airbnb Interview Guide

Asked at: Stripe Interview Guide

Scroll to Top