Question 1

How does a blockchain explorer stay synchronized with the chain in real time?

Accepted Answer

The explorer subscribes to new block events via the node's WebSocket API (eth_subscribe "newHeads" on Ethereum). When a new block header arrives, the explorer fetches the full block data and all transaction receipts via JSON-RPC. It processes and writes to the database atomically, then acknowledges the block as indexed. Chain reorganizations (reorgs) are detected when a new block's parent hash does not match the last indexed block's hash. On reorg: identify the fork point by walking back parent hashes, rollback all blocks on the orphaned fork (mark as non-canonical, reverse balance/state changes), then index the canonical blocks. Reorg depth rarely exceeds 2-3 blocks on Ethereum; a rollback buffer of 10 blocks is sufficient for most chains.

Question 2

What is the challenge of computing wallet balances and how do explorers solve it efficiently?

Accepted Answer

Computing a balance by summing all transactions for an address at query time is O(transactions) — an address like Binance has millions of transactions and this would take seconds. Explorers maintain a materialized balance index: a separate address_balances table with current_balance updated incrementally on each indexed transaction. On a BUY transaction (receiving ETH): balance += value. On a SEND transaction: balance -= (value + gas_fee). This makes balance lookup O(1). The challenge is correctness during reorgs: when rolling back a block, reverse all balance changes for that block's transactions. Track the block_number of the last update per address to validate consistency.

Question 3

How do you handle content deduplication in a document storage system?

Accepted Answer

Content deduplication (single-instancing) stores identical files once regardless of how many times they are uploaded. Implementation: compute a SHA-256 checksum of the file on the client before upload. Send the checksum to the server in the upload initiation request. The server checks if any DocumentVersion with this checksum already exists in the storage backend. If yes: create the new DocumentVersion record pointing to the existing S3 object — no upload needed. If no: proceed with the normal S3 upload. The storage_key in DocumentVersion is the S3 object key; multiple DocumentVersion records can reference the same object. Space savings: 20-40% for typical enterprise document stores where users frequently share and re-upload the same files.

Question 4

How do you implement a materialized path for folder hierarchy queries?

Accepted Answer

A materialized path stores the full path from root to the node as a string column on each folder (e.g., "/root/projects/q1/"). Operations: find all descendants of folder F: WHERE path LIKE '/root/projects/q1/%'. Find all ancestors of folder F: WHERE '/root/projects/q1/' LIKE path || '%'. Move a subtree: UPDATE folders SET path = REPLACE(path, old_path, new_path) WHERE path LIKE old_path || '%'. Index: CREATE INDEX ON folders(path text_pattern_ops) enables efficient LIKE prefix queries. Alternative: adjacency list (parent_id only) is simpler to update but requires recursive CTEs for traversal. Materialized path is preferred when reads dominate (tree traversal) and moves are rare.

Question 5

What bitwise trick detects if a number is a power of two, and why does it work?

Accepted Answer

n & (n-1) == 0 (for n > 0) detects powers of two. Why: a power of two has exactly one 1-bit set (e.g., 8 = 1000). n-1 flips that bit and sets all lower bits (7 = 0111). n & (n-1) clears the lowest set bit. If n has exactly one 1-bit, n & (n-1) = 0. If n has more than one 1-bit, n & (n-1) != 0 (only the lowest 1-bit is cleared, others remain). The same trick (n &= n-1) is used in Hamming weight computation: count how many times you can clear a 1-bit before n reaches 0 — each iteration removes exactly one 1-bit, so the loop runs exactly popcount(n) times, making it O(number of set bits) rather than O(32).

System Design: Blockchain Explorer — Indexing, Transaction Search, and Address Analytics (2025)

What is a Blockchain Explorer?

Indexing Architecture

Database Schema and Indexing

Address Analytics and Balance Computation

Serving Layer and API