System Design: Collaborative Document Editing (Google Docs) — OT, CRDT, and WebSockets

The Problem: Concurrent Edits

In real-time collaborative editing (Google Docs, Notion, Figma), multiple users edit the same document simultaneously. The core challenge: User A inserts “X” at position 5 while User B simultaneously deletes the character at position 7. By the time B’s operation reaches A’s machine, A’s insert has shifted all positions by 1 — B’s operation now targets the wrong position. Naive last-write-wins produces corrupted documents.

Approach 1: Operational Transformation (OT)

OT transforms operations relative to each other before applying. When operation B arrives after A was already applied, B is transformed to account for A’s effect on the document positions.

class Op:
    INSERT = 'insert'
    DELETE = 'delete'

def transform(op_a, op_b):
    """
    Transform op_b assuming op_a was already applied.
    Returns transformed op_b that is valid after op_a.
    """
    if op_a.type == Op.INSERT and op_b.type == Op.INSERT:
        if op_a.pos <= op_b.pos:
            return Op(Op.INSERT, op_b.pos + 1, op_b.char)  # shift right
        return op_b
    if op_a.type == Op.INSERT and op_b.type == Op.DELETE:
        if op_a.pos <= op_b.pos:
            return Op(Op.DELETE, op_b.pos + 1)             # shift right
        return op_b
    if op_a.type == Op.DELETE and op_b.type == Op.INSERT:
        if op_a.pos < op_b.pos:
            return Op(Op.INSERT, op_b.pos - 1, op_b.char)  # shift left
        return op_b
    if op_a.type == Op.DELETE and op_b.type == Op.DELETE:
        if op_a.pos == op_b.pos:
            return None  # already deleted, no-op
        if op_a.pos < op_b.pos:
            return Op(Op.DELETE, op_b.pos - 1)             # shift left
        return op_b

OT requires a central server to serialize operations and track the full history of transformations. Used by Google Docs (C++ implementation called “wave”) and Apache Wave.

Approach 2: CRDTs (Conflict-free Replicated Data Types)

CRDTs are data structures that can be merged automatically without conflicts. For text editing, the key insight: instead of tracking positions (which shift), give every character a unique, stable identity and a fractional position between its neighbors.

Logoot / LSEQ: Fractional Indexing

import uuid

class Character:
    def __init__(self, value: str, position: tuple, site_id: str, clock: int):
        self.id = (position, site_id, clock)  # globally unique identifier
        self.value = value
        self.position = position              # fractional position tuple

class CRDTDocument:
    def __init__(self):
        # Sorted by position — begin and end sentinels
        self.chars = []

    def insert_between(self, left_char, right_char, value: str, site_id: str, clock: int):
        new_pos = self._generate_position(
            left_char.position if left_char else (),
            right_char.position if right_char else (float('inf'),)
        )
        char = Character(value, new_pos, site_id, clock)
        self._insert_sorted(char)
        return char

    def delete(self, char_id):
        # Mark as tombstone (soft delete) — never remove from the sequence
        # Tombstones preserve position ordering without shifting
        char = self._find(char_id)
        if char:
            char.deleted = True

    def get_text(self) -> str:
        return ''.join(c.value for c in self.chars if not getattr(c, 'deleted', False))

Architecture

Client-Server with WebSockets

Client A                Server                  Client B
   |                      |                         |
   |--[op: insert X @5]-->|                         |
   |                      |--[broadcast op_a]------>|
   |                      |   (transform for B)     |
   |<-[ack: version 42]---|                         |
   |                      |<--[op: delete @7]-------|
   |<-[broadcast op_b]----|                         |
   |  (transform for A)   |                         |

Document Session Service

class DocumentSession:
    def __init__(self, doc_id: str):
        self.doc_id = doc_id
        self.history = []          # ordered operation log
        self.version = 0
        self.connected_users = {}  # user_id -> websocket

    def apply_operation(self, op, client_version: int, user_id: str):
        # Transform op against all operations since client_version
        concurrent_ops = self.history[client_version:]
        transformed_op = op
        for server_op in concurrent_ops:
            transformed_op = transform(transformed_op, server_op)
        if transformed_op:
            self.history.append(transformed_op)
            self.version += 1
            self._broadcast(transformed_op, exclude=user_id)
        return self.version

    def _broadcast(self, op, exclude: str):
        for uid, ws in self.connected_users.items():
            if uid != exclude:
                ws.send(op.serialize())

Presence and Cursor Sharing

Show where each collaborator’s cursor is in real time. Each client broadcasts cursor position on every keystroke. The server relays cursor positions to all other connected clients. Cursor positions are ephemeral — no persistence needed. Store in Redis: HSET doc:{id}:cursors {user_id} {position}, expire with user’s WebSocket TTL. Clients display colored cursors with user names at the broadcast positions.

Persistence and Snapshots

  • Operation log: all operations persisted to a database (immutable append-only). Enables replay from any version.
  • Snapshots: periodically checkpoint the document state (every N operations or every 5 minutes). On load, apply snapshot + subsequent operations instead of replaying all history. Reduces load time from O(history) to O(recent_ops).
  • Compression: consecutive single-character inserts can be merged into a single insert operation for a string. Run a compaction job offline on the operation log.

OT vs CRDT Trade-offs

Property OT CRDT (Logoot/Yjs)
Central server Required (serializes ops) Optional (P2P possible)
Memory Lower (no per-char metadata) Higher (tombstones + IDs)
Complexity High (transform functions for each op pair) Moderate (merge is automatic)
Offline support Limited Excellent (merge on reconnect)
Used by Google Docs, Etherpad Notion, Figma, VS Code Live Share (Yjs)

Interview Questions

Q: Why can’t you use last-write-wins for collaborative editing?

Last-write-wins discards concurrent edits based on timestamp. If User A and User B both edit the same paragraph simultaneously, one edit is simply overwritten. Users lose work without warning. The correct approach (OT or CRDT) preserves all edits by transforming them to be compatible, so both edits are reflected in the final document.

Q: How does Google Docs handle offline editing?

Google Docs stores operations locally when offline. On reconnect, the local operations are sent to the server with the version number at which the client went offline. The server transforms each local operation against all concurrent server operations (using OT), applies the transformed operations, and returns the transformed server ops to the client. The client then fast-forwards its document to the server state. This is the same OT algorithm used for online concurrent editing.

Asked at: Atlassian Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Meta Interview Guide

Asked at: Databricks Interview Guide

Asked at: Twitter/X Interview Guide

Scroll to Top