System Design: Blockchain Explorer — Indexing, Transaction Search, and Address Analytics (2025)

What is a Blockchain Explorer?

A blockchain explorer is a read-only interface to blockchain data — like a search engine for the blockchain. Users look up: transactions by hash, blocks by height or hash, wallet addresses (balance, transaction history), smart contract state and events, and network statistics (TPS, fees, validator set). The core challenge: blockchain nodes store data optimized for consensus and validation, not for ad-hoc queries. A raw node has no index on sender address, no full-text search, and no aggregations. The explorer builds and maintains a secondary database that indexes the blockchain data for efficient queries. Scale: Ethereum processes ~1M transactions per day; a full explorer must index the entire chain history (billions of transactions) and stay in real-time sync with new blocks (~12 second block time on Ethereum).

Indexing Architecture

Two phases: historical backfill and real-time ingestion. Historical backfill: run a parallel ETL pipeline that fetches blocks from genesis to current head. Parallelize by block range (worker 1: blocks 0-1M, worker 2: 1M-2M, etc.). Each worker fetches block data via the node’s JSON-RPC API (eth_getBlockByNumber, eth_getTransactionReceipt) and writes to the database. Backfill rate for a well-tuned pipeline: 5-50 blocks per second. Full Ethereum history (~20M blocks): 4-14 days with parallelism. Real-time ingestion: subscribe to new block events (WebSocket subscription: eth_subscribe “newHeads”). On each new block: fetch full block + all transaction receipts + decode event logs. Write to the database atomically. Handle chain reorganizations (reorgs): when a reorg is detected (canonical chain changes), rollback the orphaned blocks and re-index the canonical blocks.

Database Schema and Indexing

-- Blocks
CREATE TABLE blocks (
    block_number BIGINT PRIMARY KEY,
    block_hash   CHAR(66) UNIQUE,
    parent_hash  CHAR(66),
    timestamp    TIMESTAMPTZ,
    miner        CHAR(42),
    gas_used     BIGINT,
    gas_limit    BIGINT,
    tx_count     INT,
    base_fee     NUMERIC(38, 0)  -- wei
);

-- Transactions
CREATE TABLE transactions (
    tx_hash       CHAR(66) PRIMARY KEY,
    block_number  BIGINT REFERENCES blocks,
    tx_index      INT,
    from_address  CHAR(42) NOT NULL,
    to_address    CHAR(42),  -- NULL for contract creation
    value         NUMERIC(38, 0),
    gas_price     NUMERIC(38, 0),
    gas_used      BIGINT,
    status        SMALLINT,  -- 1=success, 0=reverted
    input_data    BYTEA,
    created_at    TIMESTAMPTZ
);
CREATE INDEX idx_tx_from   ON transactions(from_address, block_number DESC);
CREATE INDEX idx_tx_to     ON transactions(to_address, block_number DESC);
CREATE INDEX idx_tx_block  ON transactions(block_number, tx_index);

-- Event logs
CREATE TABLE event_logs (
    log_id       BIGSERIAL PRIMARY KEY,
    tx_hash      CHAR(66) REFERENCES transactions,
    log_index    INT,
    contract     CHAR(42),
    topic0       CHAR(66),  -- event signature hash
    topic1       CHAR(66),
    topic2       CHAR(66),
    data         BYTEA
);
CREATE INDEX idx_log_contract_topic ON event_logs(contract, topic0, log_id DESC);

Address Analytics and Balance Computation

Computing the current balance of an address by summing all incoming and outgoing transactions is too slow at query time (an address may have millions of transactions). Two approaches: Balance index: maintain a running balance per address. On each transaction: UPDATE address_balances SET balance = balance – value WHERE address = from. UPDATE address_balances SET balance = balance + value WHERE address = to. Atomic with the transaction insert. Fast balance lookups: O(1). UTXO model (Bitcoin): balance = sum of unspent outputs. Maintain a utxo table; mark UTXOs as spent on each transaction. Balance = SUM(value) WHERE address = ? AND spent_at IS NULL. For address analytics: precompute aggregates (tx_count, first_seen, last_seen, total_received, total_sent) incrementally as new transactions are indexed.

Serving Layer and API

API endpoints: GET /tx/{hash} — transaction detail with receipt and decoded logs. GET /block/{number_or_hash} — block with paginated transaction list. GET /address/{addr}/txs?page=&limit= — paginated transaction history for an address. GET /address/{addr}/balance — current balance. GET /search?q= — search by tx hash, block hash, address, or ENS name. Search: determine query type by format: 0x + 64 hex chars = tx or block hash. 0x + 40 hex chars = address. Integer = block number. String ending in .eth = ENS name (resolve via ENS contract call). Caching: finalized blocks and transactions are immutable — cache aggressively (TTL = 1 hour or infinite). Pending/unconfirmed transactions: no caching (state changes every second). Address balances: short TTL (5s) — updated on every new block. Use Redis for hot data (recent blocks, popular address balances).

See also: Coinbase Interview Prep

See also: Stripe Interview Prep

See also: Databricks Interview Prep

Scroll to Top