Low-Level Design: Document Storage System — Upload, Versioning, and Sharing

Core Entities

User: user_id, email, name, quota_bytes, used_bytes. Document: document_id, owner_id, name, mime_type, current_version_id, parent_folder_id, is_deleted (soft delete), created_at, updated_at. DocumentVersion: version_id, document_id, version_number, storage_key (S3 object key), size_bytes, checksum_sha256, created_by, created_at, is_current. Folder: folder_id, owner_id, name, parent_folder_id (NULL for root), path (materialized path for efficient tree queries, e.g., “/root/projects/q1/”), created_at. Share: share_id, document_id, shared_with_user_id (NULL for link-based), shared_with_email, permission (VIEW, COMMENT, EDIT), created_by, expires_at, share_token (UUID for link-based), is_active. AuditLog: log_id, user_id, document_id, action (UPLOAD, DOWNLOAD, VIEW, SHARE, DELETE, RESTORE), ip_address, user_agent, timestamp.

Upload Flow with Chunked Upload and Deduplication

class DocumentService:
    def initiate_upload(self, user_id: int, filename: str,
                        file_size: int, checksum: str,
                        folder_id: Optional[int]) -> UploadSession:
        # 1. Quota check
        user = self.db.get_user(user_id)
        if user.used_bytes + file_size > user.quota_bytes:
            raise QuotaExceededError()

        # 2. Deduplication check: same checksum already stored?
        existing = self.db.query(
            "SELECT storage_key FROM document_versions WHERE checksum_sha256 = %s LIMIT 1",
            checksum
        )
        if existing:
            # Reference the existing storage object (no re-upload needed)
            return UploadSession(storage_key=existing.storage_key,
                                 skip_upload=True, checksum=checksum)

        # 3. Generate storage key and pre-signed multipart upload
        storage_key = f"docs/{user_id}/{uuid4()}/{filename}"
        upload_id, part_urls = self.s3.create_multipart_upload(
            bucket=BUCKET, key=storage_key,
            part_count=ceil(file_size / PART_SIZE)
        )
        return UploadSession(storage_key=storage_key, upload_id=upload_id,
                             part_urls=part_urls, checksum=checksum)

    def complete_upload(self, user_id: int, session: UploadSession,
                        folder_id: Optional[int], name: str) -> Document:
        with self.db.transaction():
            if not session.skip_upload:
                # Complete S3 multipart upload
                self.s3.complete_multipart(session.storage_key, session.upload_id)

            doc_id = self.db.insert("documents", {
                "owner_id": user_id, "name": name, "parent_folder_id": folder_id
            })
            version_id = self.db.insert("document_versions", {
                "document_id": doc_id, "version_number": 1,
                "storage_key": session.storage_key, "checksum_sha256": session.checksum,
                "is_current": True, "created_by": user_id
            })
            self.db.execute(
                "UPDATE documents SET current_version_id = %s WHERE document_id = %s",
                version_id, doc_id
            )
            self.db.execute(
                "UPDATE users SET used_bytes = used_bytes + %s WHERE user_id = %s",
                session.size_bytes, user_id
            )
            return self.db.get_document(doc_id)

Versioning and Conflict Resolution

Each upload of an existing document creates a new DocumentVersion: version_number increments, old version remains accessible but is_current = False. Version retention policy: keep all versions for 30 days, then keep only last N versions (configurable per plan). Conflict detection for collaborative editing: use optimistic locking with a base_version_id. Client sends “I’m editing version 5.” On save: if current_version_id != 5, a conflict has occurred (another user saved in the meantime). Resolution options: (1) auto-merge (for text documents using diff3 or OT), (2) create a conflict copy (“document – conflict copy by Alice – 2026-04-17”), (3) reject with error and require manual resolution. Pre-signed download URLs for old versions: generate on demand, do not expose raw S3 keys.

Sharing and Permission Enforcement

Three sharing models: (1) Direct user share: share with specific email, permission level (VIEW/COMMENT/EDIT). (2) Link share: generate a UUID token; anyone with the link can access at the specified permission. Optionally set expiry. (3) Folder share: sharing a folder grants the same permission to all documents within it (recursive). Permission check algorithm: is user U allowed to perform action A on document D? Step 1: is U the owner? → allow all. Step 2: does U have a direct Share record for D with permission >= A? → allow. Step 3: does D’s parent folder (or any ancestor folder) have a Share for U with permission >= A? → allow. Step 4: is there an active link-based share for D? (for link access). Step 5: deny. Cache permission checks in Redis for hot documents (TTL = 60s, invalidate on share create/revoke).

See also: Atlassian Interview Prep

See also: Shopify Interview Prep

See also: Airbnb Interview Prep

Scroll to Top