System Design: Geo-Proximity Service (Yelp / Google Maps Nearby)
A geo-proximity service finds places (restaurants, drivers, stores) near a given location. It powers Yelp’s “restaurants near me,” Uber’s driver matching, and Google Maps’ nearby search. The core challenge: efficiently querying a 2D space for points within a radius — a problem that traditional B-tree indexes handle poorly.
Requirements
Functional: Add/update business locations. Search for businesses within radius R of a point. Filter by category, rating. Sort by distance or relevance. Support 200M businesses globally.
Non-functional: Search latency <100ms, 100K QPS read, eventual consistency on writes, handle global distribution.
Why Standard Indexes Don’t Work
A query “find all restaurants within 1km of (lat=40.7, lng=-74.0)” translates to: WHERE lat BETWEEN 40.69 AND 40.71 AND lng BETWEEN -74.01 AND -73.99. A compound index on (lat, lng) only sorts by lat first — the lng range scan has poor selectivity. Actual 2D proximity queries require spatial indexing.
Geospatial Indexing Approaches
Approach 1: Geohash (Recommended for Interviews)
Geohash divides the world into a grid of cells using a base-32 encoding. Each character added to a geohash string halves the cell size:
- 4 characters → ~40km × 20km cell
- 6 characters → ~1.2km × 0.6km cell (good for “nearby” searches)
- 8 characters → ~38m × 19m cell (precise location)
Key property: two locations that share a geohash prefix are close to each other. A standard B-tree index on the geohash column handles prefix queries efficiently.
def find_nearby(lat: float, lng: float, radius_km: float) -> list[Business]:
precision = 6 # ~1km cells
center_hash = geohash.encode(lat, lng, precision)
# Include neighboring cells (8 neighbors + center)
neighbor_hashes = geohash.neighbors(center_hash) + [center_hash]
candidates = db.query(
"SELECT * FROM businesses WHERE geohash6 LIKE ANY(%s)",
[h + '%' for h in neighbor_hashes]
)
# Filter by exact distance
return [b for b in candidates if haversine(lat, lng, b.lat, b.lng) <= radius_km]
Edge case: Cells near grid boundaries have different hashes but are geographically close. Always include 8 neighboring cells, not just the center cell. This ensures a point at the border of a cell is correctly matched with nearby points in the adjacent cell.
Approach 2: Quadtree
A quadtree recursively divides 2D space into 4 quadrants. Nodes split when they contain more than M points (M=100). Search: traverse from root, descend into any quadrant that overlaps the search circle, collect points from leaf nodes. Pros: dynamic insertion without hash collision. Cons: complex to implement distributed. Used in Uber’s H3 system (hexagonal quadtrees).
Approach 3: PostGIS / MySQL Spatial
Databases with spatial extensions use R-trees for geographic indexing. Query: SELECT * FROM businesses WHERE ST_Distance(location, ST_Point(-74.0, 40.7)) < 1000. Pros: SQL-based, no custom infrastructure. Cons: limited to single-node throughput; sharding spatial data is complex.
System Architecture
Client → API Gateway → Location Service
↓
[Geohash Cache (Redis)]
↓ (cache miss)
[Location DB (PostgreSQL + PostGIS)]
↓
[Business Service] → Business DB (Cassandra)
Read path: Compute geohash of search center → look up Redis for nearby business IDs (TTL=300s, rarely changes) → cache miss: query Location DB → fetch business details from Business Service → rank by distance/rating → return paginated results.
Write path: Business owner updates location → write to Location DB → update geohash index → invalidate Redis cache for affected geohash cells.
Caching Strategy
Cache key: nearby:{geohash6}:{category}. Value: list of business IDs sorted by distance. TTL: 5 minutes. This works because: (1) Most users search the same popular areas. (2) Business locations change infrequently. (3) The geohash cell covers ~1km² — any business in the cell fits one cache entry.
Hot spots (Times Square, tourist areas): higher QPS, same cache key — Redis handles millions of reads per key. No thundering herd because TTL is staggered (randomized ± 10% on write).
Distance Calculation
import math
def haversine(lat1, lng1, lat2, lng2) -> float:
"""Returns distance in kilometers."""
R = 6371 # Earth radius in km
phi1, phi2 = math.radians(lat1), math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlambda = math.radians(lng2 - lng1)
a = math.sin(dphi/2)**2 + math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
return 2 * R * math.asin(math.sqrt(a))
For most proximity searches, Euclidean distance approximation (treating lat/lng as Cartesian) is sufficient within 100km. Use Haversine for accurate distances, especially near poles or for large radii.
Handling Real-Time Location Updates (Driver Tracking)
For dynamic locations (drivers, delivery workers): drivers send GPS updates every 4 seconds. Use H3 hexagonal grid: encode location to H3 cell at resolution 9 (~174m). Store in Redis: SADD cell:{h3_cell} driver_id with 30-second TTL per driver entry. On match request: query center cell + k-ring of neighbors. Kafka ingests the ~125K location updates/second, consumer workers update Redis cells.
Interview Tips
- Lead with geohash — it’s the simplest explanation of spatial indexing and directly maps to standard DB indexes.
- Always mention the boundary cell problem and the neighbor-cell solution.
- Distinguish static (Yelp) from dynamic (Uber) location use cases — different staleness requirements.
- Haversine vs Euclidean: Euclidean is fine for <100km; mention Haversine for precision.
- QuadTree vs Geohash: QuadTree handles density-varying distributions better (dense cities, sparse countryside); Geohash is simpler to implement and cache.
🏢 Asked at: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
🏢 Asked at: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems
🏢 Asked at: DoorDash Interview Guide
🏢 Asked at: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering
🏢 Asked at: Snap Interview Guide