Question 1

What is the role of the NameNode in HDFS and what are its failure modes?

Accepted Answer

The HDFS NameNode is the single master that maintains all filesystem metadata: the directory tree, file-to-block mapping, and block-to-DataNode locations. All metadata is kept in memory for fast access (~200 bytes per file/block). Persistence: edits are written to an edit log on disk, and periodic checkpoints (FsImage) snapshot the full metadata state. Failure modes: (1) NameNode crash: without HA, HDFS goes down completely. Data on DataNodes is safe but inaccessible. (2) Memory exhaustion: very large namespaces (billions of files) can exhaust NameNode heap. Mitigate by using large block sizes (128MB reduces block count) and federation (multiple NameNodes each owning a namespace partition). (3) Edit log corruption: if the edit log is corrupted, metadata may be lost for recent operations — edit log is replicated to multiple directories (local + NFS) for redundancy. HDFS HA solution: two NameNodes (Active and Standby) sharing a Quorum Journal Manager (QJM — 3+ JournalNodes). Standby continuously replays the shared edit log; on Active failure, Standby takes over in seconds.

Question 2

How does rack-aware replication work in HDFS?

Accepted Answer

HDFS replicates each block 3 times (default replication factor = 3) with rack awareness to tolerate both node and rack failures. Default placement policy: (1) First replica on the same node as the writer (or a random node if the client is not in the cluster). (2) Second replica on a different rack from the first. (3) Third replica on the same rack as the second, but on a different node. Why this placement: if an entire rack loses power or network connectivity, at least one replica (the first) is on a different rack and remains accessible. The second and third replicas being on the same rack reduces cross-rack bandwidth for replication (only one cross-rack transfer per block instead of two). Trade-off: this policy tolerates failure of one rack. For three-rack clusters: place one replica per rack for maximum fault tolerance, but this doubles cross-rack replication bandwidth. HDFS administrator configures rack topology via a script that maps IP addresses to rack identifiers.

Question 3

What is Gorilla compression for time series data and how does HDFS block storage compare to it?

Accepted Answer

These are distinct concepts addressing different problems. Gorilla (Facebook's TSDB, 2015) is a compression algorithm for time series data: delta-delta encoding for timestamps (most successive timestamps differ by the same amount, so the second-order delta is often zero or very small), and XOR encoding for float values (successive measurements often have similar bits — XOR captures the difference compactly). Achieves ~1.37 bytes per sample vs. 16 bytes raw. Used by Prometheus TSDB and Facebook's internal metrics system. HDFS block storage is a general-purpose distributed storage with fixed 128MB blocks, pipeline replication, and CRC checksumming. It doesn't apply domain-specific compression to data — it stores raw bytes (though files themselves can be stored in compressed formats like Snappy or LZ4 which Spark/Parquet use). The distinction: Gorilla is application-level compression optimized for time series access patterns; HDFS is infrastructure-level storage optimized for large sequential reads/writes. In practice, time series systems (like Thanos storing Prometheus blocks on S3/HDFS) apply Gorilla compression first, then store the compressed blocks in object/distributed storage.

Feature	HDFS	S3
Locality	Compute on same nodes as data (Spark/Hadoop locality)	Compute and storage separated — network hop required
Throughput	Very high for sequential reads (local disk)	High but network-bound
Mutation	Append-only (no in-place edits to existing files)	Object immutable, PUT new version
Management	Self-managed cluster	Fully managed by AWS

System Design Interview: Design a Distributed File System (HDFS / GFS)

What Is a Distributed File System?

Architecture: Master-Worker (NameNode-DataNode)

NameNode (Master)

DataNode (Worker)

Block Storage

Read Path

Write Path

Fault Tolerance

HDFS vs. Object Storage (S3)

Interview Tips