System Design Interview: Design a Geo-Proximity Service (Yelp / Nearby)

System Design: Geo-Proximity Service (Yelp / Google Maps Nearby)

A geo-proximity service finds places (restaurants, drivers, stores) near a given location. It powers Yelp’s “restaurants near me,” Uber’s driver matching, and Google Maps’ nearby search. The core challenge: efficiently querying a 2D space for points within a radius — a problem that traditional B-tree indexes handle poorly.

Requirements

Functional: Add/update business locations. Search for businesses within radius R of a point. Filter by category, rating. Sort by distance or relevance. Support 200M businesses globally.

Non-functional: Search latency <100ms, 100K QPS read, eventual consistency on writes, handle global distribution.

Why Standard Indexes Don’t Work

A query “find all restaurants within 1km of (lat=40.7, lng=-74.0)” translates to: WHERE lat BETWEEN 40.69 AND 40.71 AND lng BETWEEN -74.01 AND -73.99. A compound index on (lat, lng) only sorts by lat first — the lng range scan has poor selectivity. Actual 2D proximity queries require spatial indexing.

Geospatial Indexing Approaches

Approach 1: Geohash (Recommended for Interviews)

Geohash divides the world into a grid of cells using a base-32 encoding. Each character added to a geohash string halves the cell size:

  • 4 characters → ~40km × 20km cell
  • 6 characters → ~1.2km × 0.6km cell (good for “nearby” searches)
  • 8 characters → ~38m × 19m cell (precise location)

Key property: two locations that share a geohash prefix are close to each other. A standard B-tree index on the geohash column handles prefix queries efficiently.

def find_nearby(lat: float, lng: float, radius_km: float) -> list[Business]:
    precision = 6   # ~1km cells
    center_hash = geohash.encode(lat, lng, precision)

    # Include neighboring cells (8 neighbors + center)
    neighbor_hashes = geohash.neighbors(center_hash) + [center_hash]

    candidates = db.query(
        "SELECT * FROM businesses WHERE geohash6 LIKE ANY(%s)",
        [h + '%' for h in neighbor_hashes]
    )
    # Filter by exact distance
    return [b for b in candidates if haversine(lat, lng, b.lat, b.lng) <= radius_km]

Edge case: Cells near grid boundaries have different hashes but are geographically close. Always include 8 neighboring cells, not just the center cell. This ensures a point at the border of a cell is correctly matched with nearby points in the adjacent cell.

Approach 2: Quadtree

A quadtree recursively divides 2D space into 4 quadrants. Nodes split when they contain more than M points (M=100). Search: traverse from root, descend into any quadrant that overlaps the search circle, collect points from leaf nodes. Pros: dynamic insertion without hash collision. Cons: complex to implement distributed. Used in Uber’s H3 system (hexagonal quadtrees).

Approach 3: PostGIS / MySQL Spatial

Databases with spatial extensions use R-trees for geographic indexing. Query: SELECT * FROM businesses WHERE ST_Distance(location, ST_Point(-74.0, 40.7)) < 1000. Pros: SQL-based, no custom infrastructure. Cons: limited to single-node throughput; sharding spatial data is complex.

System Architecture

Client → API Gateway → Location Service
                            ↓
              [Geohash Cache (Redis)]
                    ↓ (cache miss)
              [Location DB (PostgreSQL + PostGIS)]
                            ↓
              [Business Service] → Business DB (Cassandra)

Read path: Compute geohash of search center → look up Redis for nearby business IDs (TTL=300s, rarely changes) → cache miss: query Location DB → fetch business details from Business Service → rank by distance/rating → return paginated results.

Write path: Business owner updates location → write to Location DB → update geohash index → invalidate Redis cache for affected geohash cells.

Caching Strategy

Cache key: nearby:{geohash6}:{category}. Value: list of business IDs sorted by distance. TTL: 5 minutes. This works because: (1) Most users search the same popular areas. (2) Business locations change infrequently. (3) The geohash cell covers ~1km² — any business in the cell fits one cache entry.

Hot spots (Times Square, tourist areas): higher QPS, same cache key — Redis handles millions of reads per key. No thundering herd because TTL is staggered (randomized ± 10% on write).

Distance Calculation

import math

def haversine(lat1, lng1, lat2, lng2) -> float:
    """Returns distance in kilometers."""
    R = 6371   # Earth radius in km
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlambda = math.radians(lng2 - lng1)
    a = math.sin(dphi/2)**2 + math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
    return 2 * R * math.asin(math.sqrt(a))

For most proximity searches, Euclidean distance approximation (treating lat/lng as Cartesian) is sufficient within 100km. Use Haversine for accurate distances, especially near poles or for large radii.

Handling Real-Time Location Updates (Driver Tracking)

For dynamic locations (drivers, delivery workers): drivers send GPS updates every 4 seconds. Use H3 hexagonal grid: encode location to H3 cell at resolution 9 (~174m). Store in Redis: SADD cell:{h3_cell} driver_id with 30-second TTL per driver entry. On match request: query center cell + k-ring of neighbors. Kafka ingests the ~125K location updates/second, consumer workers update Redis cells.

Interview Tips

  • Lead with geohash — it’s the simplest explanation of spatial indexing and directly maps to standard DB indexes.
  • Always mention the boundary cell problem and the neighbor-cell solution.
  • Distinguish static (Yelp) from dynamic (Uber) location use cases — different staleness requirements.
  • Haversine vs Euclidean: Euclidean is fine for <100km; mention Haversine for precision.
  • QuadTree vs Geohash: QuadTree handles density-varying distributions better (dense cities, sparse countryside); Geohash is simpler to implement and cache.

🏢 Asked at: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

🏢 Asked at: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

🏢 Asked at: DoorDash Interview Guide

🏢 Asked at: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

🏢 Asked at: Snap Interview Guide

Scroll to Top