System Design Interview: Design a Real-Time Collaborative Editor (Google Docs)
A collaborative editor allows multiple users to edit the same document simultaneously with low-latency conflict resolution. Google Docs, Notion, and Figma solve this problem. This guide covers the core algorithms (Operational Transformation and CRDTs) and the architecture for real-time collaboration at scale.
Requirements
Functional: multiple users edit the same document simultaneously, changes appear on all editors within 500ms, cursor/presence awareness (see where others are typing), offline editing with sync on reconnect, revision history.
Non-functional: no data loss on concurrent edits, eventual consistency across all clients, scales to 100 concurrent editors per document.
The Core Problem: Concurrent Edits
When two users edit simultaneously without coordination:
Initial: "Hello World"
User A inserts "," at position 5: "Hello, World"
User B deletes "o" at position 4: "Hell World"
If we apply both naively:
Apply A first: "Hello, World" → apply B (delete pos 4): "Hell, World" ✓
Apply B first: "Hell World" → apply A (insert pos 5): "Hell ,World" ✗ (wrong position!)
Concurrent edits must be transformed against each other before application.
Operational Transformation (OT)
OT is the algorithm used by Google Docs. Each operation (insert, delete) is transformed against concurrent operations before being applied:
# Operation types:
Insert(pos, char): insert char at position pos
Delete(pos): delete character at position pos
# Transform function: transform op2 against op1
def transform(op1, op2):
if isinstance(op1, Insert) and isinstance(op2, Insert):
if op1.pos <= op2.pos:
return Insert(op2.pos + 1, op2.char) # shift right
return op2 # op2 is before op1, no change
if isinstance(op1, Insert) and isinstance(op2, Delete):
if op1.pos <= op2.pos:
return Delete(op2.pos + 1) # shift right
return op2
if isinstance(op1, Delete) and isinstance(op2, Insert):
if op1.pos < op2.pos:
return Insert(op2.pos - 1, op2.char) # shift left
return op2
if isinstance(op1, Delete) and isinstance(op2, Delete):
if op1.pos < op2.pos:
return Delete(op2.pos - 1) # shift left
if op1.pos == op2.pos:
return None # already deleted, no-op
return op2
OT Server Architecture
Client A Server Client B
| | |
| edit(op_a) | |
|--------------------> | |
| | transform(op_a, op_b) |
| |<-----------------------|
| | edit(op_b) |
| | |
| ack(op_a_transformed)| |
||
The server is the authority for the document state. It receives all operations, transforms them against the current document state, applies them in order, and broadcasts the transformed operations to all other clients.
CRDTs (Conflict-free Replicated Data Types)
CRDTs are an alternative to OT used by Figma, Notion, and Automerge. They are data structures that guarantee eventual consistency without a central server for conflict resolution.
CRDT for text (RGA — Replicated Growable Array): each character has a unique identifier (site_id, logical_clock). Insertions and deletions reference characters by unique ID, not position. Since IDs are globally unique and immutable, concurrent operations commute — applying them in any order produces the same result.
CRDT vs. OT trade-offs:
- OT: simpler algorithm, requires a central server for conflict resolution, used by Google Docs
- CRDT: more complex data structure, enables peer-to-peer sync without a central server, used for offline-first apps (Notion, Linear)
Presence and Cursor Awareness
Show where each user is typing in real time:
- Each client broadcasts cursor position updates every 500ms via WebSocket
- Position is stored as an offset in the document
- Server forwards cursor updates to all other clients in the same document session
- Cursor positions are not persisted — they are ephemeral state in Redis (TTL 10 seconds, refreshed on each update)
Document Storage
- Operations log: every operation is appended to an immutable log (Kafka or database). This enables revision history (replay operations from any point).
- Snapshots: periodically snapshot the full document state to avoid replaying from the beginning on load. Load snapshot + recent operations.
- Storage: document content in S3 (for large documents), operation log in PostgreSQL, presence in Redis.
Interview Tips
- Acknowledge OT as the classic approach (Google Docs) and CRDTs as the modern alternative — this shows breadth
- You do not need to implement OT in full detail — explain the transform function concept and that the server orders operations
- Presence (cursor) is ephemeral state in Redis — separate from document state in PostgreSQL
- Offline support: buffer operations locally, replay against the server on reconnect (OT must transform against operations that happened while offline)
- The revision history use case is a strong reason to keep an append-only operation log
Asked at: Snap Interview Guide