System Design Interview: Design a Geospatial Service (Nearby Search)

Geospatial services power “find restaurants near me,” Uber driver lookup, and Airbnb property search. The core challenge is efficiently querying for points within a radius or bounding box from billions of stored locations. This guide covers the data structures and databases used in production.

Requirements

Functional: store location data (latitude, longitude) for entities (restaurants, drivers, properties), query all entities within radius R of a given point, query within a bounding box, support real-time updates (moving drivers), support metadata filtering (category, rating).

Non-functional: 1B locations stored, <100ms radius query, 100K writes/second (moving drivers), support for global scale.

Why Naive Approaches Fail

-- Naive: compute distance for all 1B rows:
SELECT id, lat, lng,
       ST_Distance(point(lat, lng), point(user_lat, user_lng)) AS dist
FROM locations
WHERE dist < 1000  -- meters
ORDER BY dist;
-- Problem: full table scan on 1B rows = too slow

We need a spatial index that eliminates the vast majority of rows before computing distances.

Geohash

Geohash encodes a (lat, lng) coordinate into a string of characters. Properties:

  • Characters are added to increase precision: “9q8y” covers a 40km² cell; “9q8yy” covers a 1.2km² cell; “9q8yyh” covers 150m²
  • Nearby locations share a common prefix (usually, with exceptions at cell boundaries)
  • 8 characters ≈ 20-meter precision. 6 characters ≈ 1.2km precision.
import geohash2 as geohash

# Encode location:
gh = geohash.encode(37.7749, -122.4194, precision=7)  # "9q8yy9m"

# Find all geohash cells covering a radius:
neighbors = geohash.neighbors("9q8yy9m")   # 8 surrounding cells
cells = [gh] + neighbors   # 9 cells to search

# Query: find all entities whose geohash starts with any of the 9 cells
SELECT * FROM locations WHERE geohash7 IN ('9q8yy9m', '9q8yy9n', ...)

Index on geohash prefix column: O(log n) B-tree lookup. The 9-cell approach handles the boundary problem (entities near a cell edge are in an adjacent cell but still within the radius).

Quadtree

A quadtree recursively divides a 2D area into quadrants. Each leaf node represents a region with <= K points (e.g., K=100). Querying: start at root, recurse into all quadrants that overlap the search circle. Advantages over geohash: dynamically adjusts to data density (urban areas have more splits, rural areas fewer).

class QuadTreeNode:
    def __init__(self, bounds):
        self.bounds = bounds      # (min_lat, min_lng, max_lat, max_lng)
        self.points = []          # up to K points if leaf
        self.children = None      # [NW, NE, SW, SE] if internal node

    def insert(self, point):
        if self.children:
            quadrant = self.get_quadrant(point)
            quadrant.insert(point)
        else:
            self.points.append(point)
            if len(self.points) > K:
                self.subdivide()

    def query_radius(self, center, radius, results):
        if not self.overlaps_circle(center, radius): return
        if self.children:
            for child in self.children:
                child.query_radius(center, radius, results)
        else:
            for p in self.points:
                if distance(center, p) <= radius:
                    results.append(p)

PostGIS (PostgreSQL Extension)

For production systems, PostGIS provides native spatial types and indexes:

-- Schema:
CREATE TABLE locations (
    id       BIGSERIAL PRIMARY KEY,
    name     TEXT,
    category TEXT,
    geom     GEOGRAPHY(POINT, 4326)   -- WGS84 coordinate system
);
CREATE INDEX idx_geom ON locations USING GIST(geom);

-- Insert:
INSERT INTO locations (name, geom) VALUES
  ('Philz Coffee', ST_MakePoint(-122.4194, 37.7749)::geography);

-- Radius query (1 km):
SELECT id, name,
       ST_Distance(geom, ST_MakePoint(-122.4194, 37.7749)::geography) AS dist_m
FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(-122.4194, 37.7749)::geography, 1000)
ORDER BY dist_m
LIMIT 20;
-- Uses GIST index — only examines nearby cells, not full table scan

Redis GEO for Real-Time Data

For frequently-updated locations (drivers, riders):

GEOADD drivers -122.4194 37.7749 "driver_123"
GEOADD drivers -122.4100 37.7800 "driver_456"

-- Find drivers within 5km:
GEORADIUS drivers -122.4194 37.7749 5 km ASC COUNT 10
→ ["driver_456", "driver_123"]

-- With distance:
GEORADIUS drivers -122.4194 37.7749 5 km ASC COUNT 10 WITHDIST
→ [["driver_456", 0.8], ["driver_123", 0.0]]

Redis GEO stores coordinates as geohash scores in a sorted set. GEORADIUS executes in O(N+log M) where N is results and M is total entries in the key. Sharding: partition by region/city to keep each key manageable.

Interview Tips

  • Lead with why a database index alone is insufficient — 2D coordinates cannot be efficiently indexed with a 1D B-tree
  • Geohash is the most interviewer-friendly answer — easy to explain, widely used
  • The 9-cell (geohash + 8 neighbors) approach solves the boundary problem — mention it
  • Redis GEO for real-time moving objects (drivers), PostGIS for static persistent data (restaurants)
  • For scale: shard by region, keep each Redis GEO key to <1M entries

  • Twitter Interview Guide
  • Snap Interview Guide
  • Airbnb Interview Guide
  • DoorDash Interview Guide
  • Lyft Interview Guide
  • Uber Interview Guide
  • Scroll to Top