System Design Interview: Design a Geospatial Service (Nearby Search)
Geospatial services power “find restaurants near me,” Uber driver lookup, and Airbnb property search. The core challenge is efficiently querying for points within a radius or bounding box from billions of stored locations. This guide covers the data structures and databases used in production.
Requirements
Functional: store location data (latitude, longitude) for entities (restaurants, drivers, properties), query all entities within radius R of a given point, query within a bounding box, support real-time updates (moving drivers), support metadata filtering (category, rating).
Non-functional: 1B locations stored, <100ms radius query, 100K writes/second (moving drivers), support for global scale.
Why Naive Approaches Fail
-- Naive: compute distance for all 1B rows:
SELECT id, lat, lng,
ST_Distance(point(lat, lng), point(user_lat, user_lng)) AS dist
FROM locations
WHERE dist < 1000 -- meters
ORDER BY dist;
-- Problem: full table scan on 1B rows = too slow
We need a spatial index that eliminates the vast majority of rows before computing distances.
Geohash
Geohash encodes a (lat, lng) coordinate into a string of characters. Properties:
- Characters are added to increase precision: “9q8y” covers a 40km² cell; “9q8yy” covers a 1.2km² cell; “9q8yyh” covers 150m²
- Nearby locations share a common prefix (usually, with exceptions at cell boundaries)
- 8 characters ≈ 20-meter precision. 6 characters ≈ 1.2km precision.
import geohash2 as geohash
# Encode location:
gh = geohash.encode(37.7749, -122.4194, precision=7) # "9q8yy9m"
# Find all geohash cells covering a radius:
neighbors = geohash.neighbors("9q8yy9m") # 8 surrounding cells
cells = [gh] + neighbors # 9 cells to search
# Query: find all entities whose geohash starts with any of the 9 cells
SELECT * FROM locations WHERE geohash7 IN ('9q8yy9m', '9q8yy9n', ...)
Index on geohash prefix column: O(log n) B-tree lookup. The 9-cell approach handles the boundary problem (entities near a cell edge are in an adjacent cell but still within the radius).
Quadtree
A quadtree recursively divides a 2D area into quadrants. Each leaf node represents a region with <= K points (e.g., K=100). Querying: start at root, recurse into all quadrants that overlap the search circle. Advantages over geohash: dynamically adjusts to data density (urban areas have more splits, rural areas fewer).
class QuadTreeNode:
def __init__(self, bounds):
self.bounds = bounds # (min_lat, min_lng, max_lat, max_lng)
self.points = [] # up to K points if leaf
self.children = None # [NW, NE, SW, SE] if internal node
def insert(self, point):
if self.children:
quadrant = self.get_quadrant(point)
quadrant.insert(point)
else:
self.points.append(point)
if len(self.points) > K:
self.subdivide()
def query_radius(self, center, radius, results):
if not self.overlaps_circle(center, radius): return
if self.children:
for child in self.children:
child.query_radius(center, radius, results)
else:
for p in self.points:
if distance(center, p) <= radius:
results.append(p)
PostGIS (PostgreSQL Extension)
For production systems, PostGIS provides native spatial types and indexes:
-- Schema:
CREATE TABLE locations (
id BIGSERIAL PRIMARY KEY,
name TEXT,
category TEXT,
geom GEOGRAPHY(POINT, 4326) -- WGS84 coordinate system
);
CREATE INDEX idx_geom ON locations USING GIST(geom);
-- Insert:
INSERT INTO locations (name, geom) VALUES
('Philz Coffee', ST_MakePoint(-122.4194, 37.7749)::geography);
-- Radius query (1 km):
SELECT id, name,
ST_Distance(geom, ST_MakePoint(-122.4194, 37.7749)::geography) AS dist_m
FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(-122.4194, 37.7749)::geography, 1000)
ORDER BY dist_m
LIMIT 20;
-- Uses GIST index — only examines nearby cells, not full table scan
Redis GEO for Real-Time Data
For frequently-updated locations (drivers, riders):
GEOADD drivers -122.4194 37.7749 "driver_123"
GEOADD drivers -122.4100 37.7800 "driver_456"
-- Find drivers within 5km:
GEORADIUS drivers -122.4194 37.7749 5 km ASC COUNT 10
→ ["driver_456", "driver_123"]
-- With distance:
GEORADIUS drivers -122.4194 37.7749 5 km ASC COUNT 10 WITHDIST
→ [["driver_456", 0.8], ["driver_123", 0.0]]
Redis GEO stores coordinates as geohash scores in a sorted set. GEORADIUS executes in O(N+log M) where N is results and M is total entries in the key. Sharding: partition by region/city to keep each key manageable.
Interview Tips
- Lead with why a database index alone is insufficient — 2D coordinates cannot be efficiently indexed with a 1D B-tree
- Geohash is the most interviewer-friendly answer — easy to explain, widely used
- The 9-cell (geohash + 8 neighbors) approach solves the boundary problem — mention it
- Redis GEO for real-time moving objects (drivers), PostGIS for static persistent data (restaurants)
- For scale: shard by region, keep each Redis GEO key to <1M entries