What is a Blockchain Explorer?
A blockchain explorer is a read-only interface to blockchain data — like a search engine for the blockchain. Users look up: transactions by hash, blocks by height or hash, wallet addresses (balance, transaction history), smart contract state and events, and network statistics (TPS, fees, validator set). The core challenge: blockchain nodes store data optimized for consensus and validation, not for ad-hoc queries. A raw node has no index on sender address, no full-text search, and no aggregations. The explorer builds and maintains a secondary database that indexes the blockchain data for efficient queries. Scale: Ethereum processes ~1M transactions per day; a full explorer must index the entire chain history (billions of transactions) and stay in real-time sync with new blocks (~12 second block time on Ethereum).
Indexing Architecture
Two phases: historical backfill and real-time ingestion. Historical backfill: run a parallel ETL pipeline that fetches blocks from genesis to current head. Parallelize by block range (worker 1: blocks 0-1M, worker 2: 1M-2M, etc.). Each worker fetches block data via the node’s JSON-RPC API (eth_getBlockByNumber, eth_getTransactionReceipt) and writes to the database. Backfill rate for a well-tuned pipeline: 5-50 blocks per second. Full Ethereum history (~20M blocks): 4-14 days with parallelism. Real-time ingestion: subscribe to new block events (WebSocket subscription: eth_subscribe “newHeads”). On each new block: fetch full block + all transaction receipts + decode event logs. Write to the database atomically. Handle chain reorganizations (reorgs): when a reorg is detected (canonical chain changes), rollback the orphaned blocks and re-index the canonical blocks.
Database Schema and Indexing
-- Blocks
CREATE TABLE blocks (
block_number BIGINT PRIMARY KEY,
block_hash CHAR(66) UNIQUE,
parent_hash CHAR(66),
timestamp TIMESTAMPTZ,
miner CHAR(42),
gas_used BIGINT,
gas_limit BIGINT,
tx_count INT,
base_fee NUMERIC(38, 0) -- wei
);
-- Transactions
CREATE TABLE transactions (
tx_hash CHAR(66) PRIMARY KEY,
block_number BIGINT REFERENCES blocks,
tx_index INT,
from_address CHAR(42) NOT NULL,
to_address CHAR(42), -- NULL for contract creation
value NUMERIC(38, 0),
gas_price NUMERIC(38, 0),
gas_used BIGINT,
status SMALLINT, -- 1=success, 0=reverted
input_data BYTEA,
created_at TIMESTAMPTZ
);
CREATE INDEX idx_tx_from ON transactions(from_address, block_number DESC);
CREATE INDEX idx_tx_to ON transactions(to_address, block_number DESC);
CREATE INDEX idx_tx_block ON transactions(block_number, tx_index);
-- Event logs
CREATE TABLE event_logs (
log_id BIGSERIAL PRIMARY KEY,
tx_hash CHAR(66) REFERENCES transactions,
log_index INT,
contract CHAR(42),
topic0 CHAR(66), -- event signature hash
topic1 CHAR(66),
topic2 CHAR(66),
data BYTEA
);
CREATE INDEX idx_log_contract_topic ON event_logs(contract, topic0, log_id DESC);
Address Analytics and Balance Computation
Computing the current balance of an address by summing all incoming and outgoing transactions is too slow at query time (an address may have millions of transactions). Two approaches: Balance index: maintain a running balance per address. On each transaction: UPDATE address_balances SET balance = balance – value WHERE address = from. UPDATE address_balances SET balance = balance + value WHERE address = to. Atomic with the transaction insert. Fast balance lookups: O(1). UTXO model (Bitcoin): balance = sum of unspent outputs. Maintain a utxo table; mark UTXOs as spent on each transaction. Balance = SUM(value) WHERE address = ? AND spent_at IS NULL. For address analytics: precompute aggregates (tx_count, first_seen, last_seen, total_received, total_sent) incrementally as new transactions are indexed.
Serving Layer and API
API endpoints: GET /tx/{hash} — transaction detail with receipt and decoded logs. GET /block/{number_or_hash} — block with paginated transaction list. GET /address/{addr}/txs?page=&limit= — paginated transaction history for an address. GET /address/{addr}/balance — current balance. GET /search?q= — search by tx hash, block hash, address, or ENS name. Search: determine query type by format: 0x + 64 hex chars = tx or block hash. 0x + 40 hex chars = address. Integer = block number. String ending in .eth = ENS name (resolve via ENS contract call). Caching: finalized blocks and transactions are immutable — cache aggressively (TTL = 1 hour or infinite). Pending/unconfirmed transactions: no caching (state changes every second). Address balances: short TTL (5s) — updated on every new block. Use Redis for hot data (recent blocks, popular address balances).
See also: Coinbase Interview Prep
See also: Stripe Interview Prep
See also: Databricks Interview Prep