Question 1

How does Amazon S3 achieve 11 nines of durability?

Accepted Answer

S3's 99.999999999% durability means losing one object per 100 billion object-years. Achieved through: (1) Geographic redundancy: objects replicated across at least 3 Availability Zones within a region. Each AZ is physically separate (different power grid, flooding zone, network). Loss of an entire AZ doesn't affect the object. (2) Erasure coding (for Standard storage class): Reed-Solomon erasure coding splits data into k data shards + m parity shards. Any k shards can reconstruct the object. At S3's scale: 6+3 or 8+4 erasure coding is common. Tolerate loss of 3-4 shards (entire storage nodes). (3) Data integrity verification: every write is checksummed (MD5 + CRC32). Reads are verified against checksum. Silent data corruption (bit rot on disk) is detected and repaired automatically. (4) Continuous scrubbing: background processes continuously read and verify all stored data. Corrupted blocks are repaired from intact shards before more shards fail. (5) Cross-region replication (optional): for additional durability, objects can be replicated to a second region asynchronously.

Question 2

How does multipart upload work for large objects in S3?

Accepted Answer

Standard single PUT requests have a 5GB limit and are susceptible to connection failures that require restarting from zero. Multipart upload addresses this for large objects (S3 minimum part size: 5MB, except the last part). Flow: (1) InitiateMultipartUpload → server returns upload_id. (2) UploadPart(upload_id, part_number 1-10000, bytes) → returns ETag (MD5 of the part). Each part can be uploaded in parallel from multiple threads/machines. Failed parts can be retried without restarting others. (3) CompleteMultipartUpload(upload_id, [part_number: etag] list) → server assembles the final object atomically. Server verifies that all parts are present and checksums match. If any part is missing, the complete operation fails. (4) AbortMultipartUpload: clean up incomplete uploads (add a lifecycle rule to auto-abort incomplete uploads after 7 days to avoid paying for stored parts). Concurrent uploads: each part can be uploaded in parallel. For a 100GB file with 64MB parts: 1563 parts, parallelizing 50 at a time → ~100GB / (50 * network_bandwidth) upload time.

Question 3

What is the difference between S3 Standard, S3-IA, and S3 Glacier?

Accepted Answer

S3 offers multiple storage classes with different availability, retrieval latency, and cost trade-offs. S3 Standard: 99.99% availability, millisecond retrieval. Most expensive per GB stored (~$0.023/GB/month). For frequently accessed data. S3 Standard-IA (Infrequent Access): 99.9% availability, millisecond retrieval. Cheaper storage ($0.0125/GB/month) but has a per-GB retrieval fee. Minimum 30-day storage charge. For data accessed monthly or less (backups, DR). S3 Glacier Instant Retrieval: millisecond retrieval, 90-day minimum, cheapest for cold data with occasional access. S3 Glacier Flexible Retrieval: 3-5 hour retrieval, very cheap storage. For archives accessed annually. S3 Glacier Deep Archive: 12-hour retrieval, cheapest. True cold storage. Lifecycle policies automate transitions: Standard → IA after 30 days, → Glacier after 90 days, → delete after 365 days. Engineering decision framework: ask "how often is this data accessed?" → choose storage class accordingly. Use S3 Intelligent-Tiering when access patterns are unpredictable — automatically moves objects between tiers based on actual access.

System Design Interview: Design an Object Storage System (Amazon S3)

What Is an Object Storage System?

System Requirements

Functional

Non-Functional

Architecture

Metadata Service

Data Nodes

Durability via Erasure Coding

Multipart Upload

Consistency Model

Caching and CDN

Interview Tips