System Design: Location Service — Proximity Search, Geohashing, and Real-Time Tracking

Requirements

A location service has two core functions: proximity search (find entities near a given coordinate — restaurants, drivers, friends) and real-time location tracking (store and retrieve the current position of moving entities). Examples: Yelp’s “restaurants near me”, Uber’s driver tracking, WhatsApp “share live location”, Pokémon GO. Scale: Google Maps indexes 200M+ businesses. Uber has 5M+ active drivers sending location updates every 4 seconds. Yelp serves 150M+ monthly unique visitors. The key technical challenge is efficient geospatial indexing — standard B-tree indexes don’t support 2D radius queries efficiently.

Geospatial Indexing Approaches

Problem: find all entities within R km of (lat, lng). Naive: compute distance to every entity — O(N). Unacceptable at scale. Geohash: encode a (lat, lng) pair into a base-32 string (e.g., “9q8yy”). Nearby locations share a common prefix. Partition the map into a grid; each cell has a geohash. To find nearby entities: look up entities in the same geohash cell and its 8 neighbors. Fast prefix-based lookup in a standard index (LIKE ‘9q8yy%’). Precision: geohash length 5 = ~5km × 5km cells; length 6 = ~1.2km × 0.6km cells; length 7 = ~150m × 150m cells. Choose precision based on the search radius. Quadtree: recursively divide a 2D space into quadrants until each cell contains ≤ K points. Variable density — densely populated areas have smaller cells. Efficient for point-in-polygon and range queries. Used by Twitter for geotagged tweets. R-tree: tree structure where each node covers a bounding rectangle. Supports range queries and nearest-neighbor search. Used by PostGIS (PostgreSQL spatial extension). Redis GEOSEARCH: Redis natively supports geospatial indexing via GEOADD (stores points) and GEORADIUS / GEOSEARCH (radius search). Internally uses a sorted set with geohash-encoded scores. O(N + log M) per radius query. The simplest production-ready solution for moderate scale.

Proximity Search Architecture

Read-heavy workload (searches heavily outnumber writes). Architecture: location data is stored in a Location table (entity_id, entity_type, lat, lng, geohash6, geohash7, updated_at). Index on geohash6 and geohash7 for prefix queries. For search: given a query (lat, lng, radius_km): compute the geohash prefix for the query point at the appropriate precision. Query all entities matching the 9 geohash cells (current + 8 neighbors). Filter results to those within the exact radius (Haversine or Euclidean). Rank by distance, return top K. Caching: for popular fixed queries (e.g., “restaurants near Times Square”), cache the result in Redis for 5 minutes. For user-specific queries (near current GPS location), caching is impractical — locations differ by meters. Database sharding: shard by geohash prefix (all entities in geohash prefix “9q” go to shard 1, “9r” to shard 2). This clusters nearby entities together on the same shard, minimizing cross-shard queries for proximity searches.

Real-Time Location Tracking

Use case: track the live position of drivers, delivery couriers, or friends. Write patterns: 5M drivers * 1 update / 4s = 1.25M writes/second. This requires a write-optimized architecture. Write path: device sends UDP location update → location ingestion service → writes current position to Redis (fast, for serving) + publishes to Kafka (for durability and async processing). Redis key: SET driver:{id}:location “{lat},{lng},{timestamp}” with TTL of 30 seconds (auto-expires stale locations). Kafka consumer: persists location history to Cassandra (time-series, write-optimized). Schema: (entity_id, timestamp, lat, lng) partitioned by entity_id. Read path: get a driver’s current location → read from Redis (< 1ms). Get a driver's trip history → query Cassandra by entity_id and time range. Subscribe to live updates: WebSocket or Server-Sent Events; the server subscribes to a Redis pub/sub channel per entity. When a new location arrives, it's published to the channel. All subscribers (e.g., the rider's app) receive it in real time.

Interview Tips

  • Geohash is the most interview-friendly geospatial index — explain prefix sharing, cell size by precision, and the 9-cell neighbor query.
  • For driver tracking writes at scale, always separate the fast path (Redis for current position) from the durable path (Kafka → Cassandra for history).
  • Redis GEOSEARCH is the “just works” answer for moderate scale; mention geohash + SQL or quadtree for custom implementations.
  • Distinguish proximity search (read-heavy, index-optimized) from real-time tracking (write-heavy, Redis + stream).

See also: Uber Interview Prep

See also: Lyft Interview Prep

See also: DoorDash Interview Prep

See also: Snap Interview Prep

Scroll to Top