Question 1

When should you choose a document store over a relational database?

Accepted Answer

Choose a document store when: (1) Schema is heterogeneous -- different documents in the same collection have different fields (product catalog where electronics have different attributes than clothing). (2) Data is naturally hierarchical and read together -- a blog post with its embedded comments and tags is a better fit as one document than normalized across 3 tables. (3) Schema evolves rapidly -- early-stage products with frequent attribute additions benefit from schemaless flexibility. (4) Horizontal write scaling is required -- document stores shard more naturally than relational databases. (5) The application reads one entity at a time (by ID) more than it joins across entities. Choose relational when: strong ACID transactions across multiple entities are required (financial systems), data has complex many-to-many relationships best expressed with joins, ad-hoc analytics require flexible aggregation across columns, or the data is highly normalized and stable. Many modern systems use both: relational for transactional core, document store for flexible product or user attributes.

Question 2

How does MongoDB's aggregation pipeline work?

Accepted Answer

MongoDB's aggregation pipeline processes documents through a sequence of stages, each transforming the data. Common stages: $match (filter documents, like WHERE), $group (aggregate by a field, like GROUP BY + aggregate functions), $project (reshape documents, include/exclude/compute fields), $sort, $limit, $skip (pagination), $lookup (left join with another collection), $unwind (deconstruct an array field into multiple documents). Example: find total revenue by country for completed orders: db.orders.aggregate([$match: {status: "COMPLETED"}, $group: {_id: "$country", total: {$sum: "$amount

Question 3

What is the N+1 problem in document stores and how do you solve it?

Accepted Answer

The N+1 problem: you query for N parent documents, then issue one query per document to fetch related data (total N+1 queries). Example: fetch 20 blog posts, then for each post fetch the author document = 1 + 20 = 21 queries. Solutions: (1) Embed the relevant author fields directly in each post document (author_name, author_avatar). Denormalize. Reading posts no longer requires a separate author lookup. Trade-off: author data duplicated across all posts; must update all posts if the author's name changes (or accept eventual consistency for non-critical fields). (2) Application-level batch fetch: fetch all 20 posts, collect unique author_ids, fetch all authors in one query (db.users.find({_id: {$in: author_ids}})), build a map, join in application code. 2 queries total. (3) MongoDB $lookup (join in aggregation pipeline): handles this at the database level. For reads dominated by single-document fetches (user profile, product page): embed to eliminate N+1. For analytical queries across many documents: batch fetch or aggregation pipeline.

Question 4

How does DynamoDB's single-table design work?

Accepted Answer

DynamoDB's single-table design collocates multiple entity types in one table to enable efficient access patterns. Every item has a PK (partition key) and SK (sort key). By using prefixes and composite keys, you store different entity types together: PK="USER#123", SK="PROFILE" (user profile), PK="USER#123", SK="ORDER#2024-01-15#456" (user's order), PK="ORDER#456", SK="ITEM#789" (order item). Access patterns: "get user profile" = GetItem(PK="USER#123", SK="PROFILE"). "get all orders for user 123" = Query(PK="USER#123", SK begins_with "ORDER#"). "get all items in order 456" = Query(PK="ORDER#456", SK begins_with "ITEM#"). This colocation enables relational-like queries with DynamoDB's O(1) key-value lookups. GSI (Global Secondary Index): define an alternate PK/SK for different access patterns (e.g., GSI on email for user lookup by email). Design discipline: enumerate all required access patterns before designing the schema -- DynamoDB is access-pattern-driven, not schema-driven.

Question 5

How do you handle schema migrations in a document store?

Accepted Answer

Document stores don't enforce a schema, so "migrations" are different from relational ALTER TABLE. Strategies: (1) Lazy migration: add a schema_version field to each document. Application code handles both the old and new schema formats (if version==1: use old field name, else new field name). Documents are upgraded to the new schema on next write. Pros: no downtime, no batch job needed. Cons: application code has migration logic indefinitely; reporting queries must handle both formats. (2) Background migration job: write a script that reads documents in batches, transforms them to the new schema, and writes them back. Run during low-traffic hours. Pros: clean code once all documents are migrated. Cons: risk of missing documents, script bugs. (3) Write-to-new, read-from-both (dual-read period): new writes use the new schema; reads fall back to old schema. Once the migration job catches up, drop old-format fallback. (4) New collection: write new documents to a new collection, migrate old documents, then rename. Zero-downtime with careful coordination.

System Design: Document Store — Schema-Flexible Storage, Indexing, and Consistency Trade-offs

What is a Document Store?

Data Modeling

Indexing Strategy

Consistency and Transactions

Sharding and Horizontal Scale