System Design: Blockchain Explorer — Indexing, Transaction Search, and Address Analytics (2025)

What is a Blockchain Explorer?

A blockchain explorer is a read-only interface to blockchain data — like a search engine for the blockchain. Users look up: transactions by hash, blocks by height or hash, wallet addresses (balance, transaction history), smart contract state and events, and network statistics (TPS, fees, validator set). The core challenge: blockchain nodes store data optimized for consensus and validation, not for ad-hoc queries. A raw node has no index on sender address, no full-text search, and no aggregations. The explorer builds and maintains a secondary database that indexes the blockchain data for efficient queries. Scale: Ethereum processes ~1M transactions per day; a full explorer must index the entire chain history (billions of transactions) and stay in real-time sync with new blocks (~12 second block time on Ethereum).

Indexing Architecture

Two phases: historical backfill and real-time ingestion. Historical backfill: run a parallel ETL pipeline that fetches blocks from genesis to current head. Parallelize by block range (worker 1: blocks 0-1M, worker 2: 1M-2M, etc.). Each worker fetches block data via the node’s JSON-RPC API (eth_getBlockByNumber, eth_getTransactionReceipt) and writes to the database. Backfill rate for a well-tuned pipeline: 5-50 blocks per second. Full Ethereum history (~20M blocks): 4-14 days with parallelism. Real-time ingestion: subscribe to new block events (WebSocket subscription: eth_subscribe “newHeads”). On each new block: fetch full block + all transaction receipts + decode event logs. Write to the database atomically. Handle chain reorganizations (reorgs): when a reorg is detected (canonical chain changes), rollback the orphaned blocks and re-index the canonical blocks.

Database Schema and Indexing

-- Blocks
CREATE TABLE blocks (
    block_number BIGINT PRIMARY KEY,
    block_hash   CHAR(66) UNIQUE,
    parent_hash  CHAR(66),
    timestamp    TIMESTAMPTZ,
    miner        CHAR(42),
    gas_used     BIGINT,
    gas_limit    BIGINT,
    tx_count     INT,
    base_fee     NUMERIC(38, 0)  -- wei
);

-- Transactions
CREATE TABLE transactions (
    tx_hash       CHAR(66) PRIMARY KEY,
    block_number  BIGINT REFERENCES blocks,
    tx_index      INT,
    from_address  CHAR(42) NOT NULL,
    to_address    CHAR(42),  -- NULL for contract creation
    value         NUMERIC(38, 0),
    gas_price     NUMERIC(38, 0),
    gas_used      BIGINT,
    status        SMALLINT,  -- 1=success, 0=reverted
    input_data    BYTEA,
    created_at    TIMESTAMPTZ
);
CREATE INDEX idx_tx_from   ON transactions(from_address, block_number DESC);
CREATE INDEX idx_tx_to     ON transactions(to_address, block_number DESC);
CREATE INDEX idx_tx_block  ON transactions(block_number, tx_index);

-- Event logs
CREATE TABLE event_logs (
    log_id       BIGSERIAL PRIMARY KEY,
    tx_hash      CHAR(66) REFERENCES transactions,
    log_index    INT,
    contract     CHAR(42),
    topic0       CHAR(66),  -- event signature hash
    topic1       CHAR(66),
    topic2       CHAR(66),
    data         BYTEA
);
CREATE INDEX idx_log_contract_topic ON event_logs(contract, topic0, log_id DESC);

Address Analytics and Balance Computation

Computing the current balance of an address by summing all incoming and outgoing transactions is too slow at query time (an address may have millions of transactions). Two approaches: Balance index: maintain a running balance per address. On each transaction: UPDATE address_balances SET balance = balance – value WHERE address = from. UPDATE address_balances SET balance = balance + value WHERE address = to. Atomic with the transaction insert. Fast balance lookups: O(1). UTXO model (Bitcoin): balance = sum of unspent outputs. Maintain a utxo table; mark UTXOs as spent on each transaction. Balance = SUM(value) WHERE address = ? AND spent_at IS NULL. For address analytics: precompute aggregates (tx_count, first_seen, last_seen, total_received, total_sent) incrementally as new transactions are indexed.

Serving Layer and API

API endpoints: GET /tx/{hash} — transaction detail with receipt and decoded logs. GET /block/{number_or_hash} — block with paginated transaction list. GET /address/{addr}/txs?page=&limit= — paginated transaction history for an address. GET /address/{addr}/balance — current balance. GET /search?q= — search by tx hash, block hash, address, or ENS name. Search: determine query type by format: 0x + 64 hex chars = tx or block hash. 0x + 40 hex chars = address. Integer = block number. String ending in .eth = ENS name (resolve via ENS contract call). Caching: finalized blocks and transactions are immutable — cache aggressively (TTL = 1 hour or infinite). Pending/unconfirmed transactions: no caching (state changes every second). Address balances: short TTL (5s) — updated on every new block. Use Redis for hot data (recent blocks, popular address balances).


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a blockchain explorer stay synchronized with the chain in real time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The explorer subscribes to new block events via the node's WebSocket API (eth_subscribe "newHeads" on Ethereum). When a new block header arrives, the explorer fetches the full block data and all transaction receipts via JSON-RPC. It processes and writes to the database atomically, then acknowledges the block as indexed. Chain reorganizations (reorgs) are detected when a new block's parent hash does not match the last indexed block's hash. On reorg: identify the fork point by walking back parent hashes, rollback all blocks on the orphaned fork (mark as non-canonical, reverse balance/state changes), then index the canonical blocks. Reorg depth rarely exceeds 2-3 blocks on Ethereum; a rollback buffer of 10 blocks is sufficient for most chains.”}},{“@type”:”Question”,”name”:”What is the challenge of computing wallet balances and how do explorers solve it efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Computing a balance by summing all transactions for an address at query time is O(transactions) — an address like Binance has millions of transactions and this would take seconds. Explorers maintain a materialized balance index: a separate address_balances table with current_balance updated incrementally on each indexed transaction. On a BUY transaction (receiving ETH): balance += value. On a SEND transaction: balance -= (value + gas_fee). This makes balance lookup O(1). The challenge is correctness during reorgs: when rolling back a block, reverse all balance changes for that block's transactions. Track the block_number of the last update per address to validate consistency.”}},{“@type”:”Question”,”name”:”How do you handle content deduplication in a document storage system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Content deduplication (single-instancing) stores identical files once regardless of how many times they are uploaded. Implementation: compute a SHA-256 checksum of the file on the client before upload. Send the checksum to the server in the upload initiation request. The server checks if any DocumentVersion with this checksum already exists in the storage backend. If yes: create the new DocumentVersion record pointing to the existing S3 object — no upload needed. If no: proceed with the normal S3 upload. The storage_key in DocumentVersion is the S3 object key; multiple DocumentVersion records can reference the same object. Space savings: 20-40% for typical enterprise document stores where users frequently share and re-upload the same files.”}},{“@type”:”Question”,”name”:”How do you implement a materialized path for folder hierarchy queries?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A materialized path stores the full path from root to the node as a string column on each folder (e.g., "/root/projects/q1/"). Operations: find all descendants of folder F: WHERE path LIKE '/root/projects/q1/%'. Find all ancestors of folder F: WHERE '/root/projects/q1/' LIKE path || '%'. Move a subtree: UPDATE folders SET path = REPLACE(path, old_path, new_path) WHERE path LIKE old_path || '%'. Index: CREATE INDEX ON folders(path text_pattern_ops) enables efficient LIKE prefix queries. Alternative: adjacency list (parent_id only) is simpler to update but requires recursive CTEs for traversal. Materialized path is preferred when reads dominate (tree traversal) and moves are rare.”}},{“@type”:”Question”,”name”:”What bitwise trick detects if a number is a power of two, and why does it work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”n & (n-1) == 0 (for n > 0) detects powers of two. Why: a power of two has exactly one 1-bit set (e.g., 8 = 1000). n-1 flips that bit and sets all lower bits (7 = 0111). n & (n-1) clears the lowest set bit. If n has exactly one 1-bit, n & (n-1) = 0. If n has more than one 1-bit, n & (n-1) != 0 (only the lowest 1-bit is cleared, others remain). The same trick (n &= n-1) is used in Hamming weight computation: count how many times you can clear a 1-bit before n reaches 0 — each iteration removes exactly one 1-bit, so the loop runs exactly popcount(n) times, making it O(number of set bits) rather than O(32).”}}]}

See also: Coinbase Interview Prep

See also: Stripe Interview Prep

See also: Databricks Interview Prep

Scroll to Top