What Is Collaborative Document Editing?
Real-time collaborative editing allows multiple users to simultaneously edit the same document, with changes from each user appearing on all other users’ screens within milliseconds. Google Docs, Notion, and Figma are examples. The core engineering challenge: how do you reconcile concurrent edits that conflict with each other without losing any user’s work?
The Concurrency Problem
User A types “Hello” at position 0. Simultaneously, User B types “World” at position 0. When A’s edit arrives at B’s client (and vice versa), the naive application produces garbled text. Two solutions dominate: Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDTs).
Operational Transformation (OT)
Every edit is expressed as an operation: INSERT(pos, char) or DELETE(pos). When two concurrent operations arrive, they must be transformed against each other to account for the offset changes the other operation introduces.
Example:
Doc: "AC"
Op1 (A): INSERT(1, "B") → "ABC"
Op2 (B): INSERT(1, "X") → "AXC"
At server: receive Op1 first. Transform Op2 against Op1:
Op2 is at position 1; Op1 inserted at position 1 (same or before), so Op2 shifts to position 2.
Op2' = INSERT(2, "X")
Apply Op1 then Op2': "AC" → "ABC" → "ABXC" ✓
Both clients converge to "ABXC".
OT requires a central server to serialize and transform concurrent operations. Complexity grows with the number of concurrent users and complex operations (multi-char inserts, undo). Google Docs uses OT with a central server.
CRDTs (Conflict-free Replicated Data Types)
CRDTs encode ordering information within operations, eliminating the need for transformation. Each character gets a unique identifier (a logical timestamp or fractional index) that determines its position regardless of insertion order. Deletions mark characters as “tombstoned” rather than removing them. Any order of applying operations converges to the same result — no central server required for convergence. Used by: Figma, Linear, and peer-to-peer editors. Trade-off: CRDTs accumulate tombstones and metadata, increasing document size over time (requires periodic compaction).
System Architecture (OT-based, Google Docs style)
Client → WebSocket → Collaboration Server (OT engine) → PostgreSQL (doc state + op log)
→ Pub/Sub → Other Clients
- Client generates operation, sends to server with current revision number
- Server locks the document (optimistic concurrency), transforms incoming op against any ops committed since that revision
- Applies transformed op, increments revision, persists to DB
- Broadcasts operation to all other connected clients via pub/sub (Redis)
- Clients apply the broadcast operation to their local doc
Presence and Cursor Sharing
Show where each user’s cursor is in real time. Separate low-priority channel from document operations. Cursor positions are transient — stored in Redis with short TTL (5 seconds), not in the persistent document store. Update cursor via WebSocket heartbeat. If a client disconnects, their cursor disappears when the Redis key expires.
Offline Editing and Sync
Users edit while offline (airplane mode). On reconnect: client sends all buffered operations with the revision at which they were made. Server transforms and applies in sequence. Conflict resolution: OT handles this deterministically — offline edits are merged correctly as long as the operation history is preserved. Large divergence (many offline edits): server may need to transform against 100+ concurrent ops — bounded by offline duration and edit rate.
Persistent Storage
Store the full operation log (not just the current document state). Current state = apply all ops from revision 0. Benefits: undo history (up to any revision), version history (show doc at any point in time), audit log. Snapshot periodically (every 1000 ops) to avoid replaying from revision 0 on every load.
Interview Tips
- OT vs. CRDT: OT needs central server for correctness; CRDTs are decentralized but accumulate metadata.
- Operation log is the source of truth; current state is derived — enables undo and version history “for free.”
- Cursor sharing is presence, not document state — different latency and persistence requirements.
- Snapshots prevent O(N) replay on every document load — periodic snapshot + apply subsequent ops.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is Operational Transformation and how does it resolve concurrent edits?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Operational Transformation (OT) is the algorithm that enables Google Docs-style concurrent editing. Each edit is expressed as an operation: INSERT(position, text) or DELETE(position, length). When two users make concurrent edits, their operations are generated against the same document state. When one operation arrives at the server after the other has already been applied, the server transforms the incoming operation against the already-applied operation to account for the position shift it caused. Example: doc is "AC". User A inserts "B" at position 1: INSERT(1,"B"). User B inserts "X" at position 1: INSERT(1,"X"). Server receives A first, applies it → "ABC". B's op arrives. Transform B's INSERT(1,"X") against A's INSERT(1,"B"): A's insert is at position 1, B's is also at position 1 — by convention, A wins and B shifts right: INSERT(2,"X"). Apply → "ABXC". B's client receives A's op, transforms it against its own op in the same way, arrives at "ABXC". Both converge.” }
},
{
“@type”: “Question”,
“name”: “What is the difference between OT and CRDTs for collaborative editing?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Operational Transformation (OT): requires a central server to serialize concurrent operations and transform them. All clients connect to the server; the server is the arbiter of operation order. Benefits: well-understood for text editing, used by Google Docs. Drawbacks: requires server coordination; OT transformation functions are complex and error-prone for rich text (beyond plain text). CRDTs (Conflict-free Replicated Data Types): each character gets a globally unique identifier embedded at creation time. Identifiers encode ordering without needing transformation — any order of applying operations converges to the same result (commutativity). No central coordinator required: clients can exchange operations peer-to-peer. Used by: Figma, Linear, Automerge. Drawback: documents grow in size as tombstoned (deleted) characters accumulate metadata permanently — requires periodic compaction (garbage collection of acknowledged deletions). For a software engineering interview: OT is the right answer when "central server" is acceptable; CRDTs when "peer-to-peer" or "offline-first" is required.” }
},
{
“@type”: “Question”,
“name”: “How does version history work in a collaborative document editor?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Version history is a consequence of the append-only operation log architecture. Every edit is an immutable operation appended to a log: the log is the complete history. To view the document at any past revision: replay operations from the beginning up to that revision. To avoid O(N) replay time for large documents: take periodic snapshots — serialize the full document state every 1000 operations. On load: find the most recent snapshot before the target revision, then apply only the operations since that snapshot. The operation log also enables: (1) undo/redo — undo is a new inverse operation appended to the log (not a log rollback), maintaining a clean audit trail; (2) named versions ("Version before review") — tag specific revision numbers; (3) blame/diff — compute diff between any two revisions by comparing the operations between them. Storage: store snapshots in blob storage (S3), operation log in an append-optimized DB (Cassandra with time-series partitioning, or PostgreSQL with range partitioning by revision number).” }
}
]
}