System Design Interview: Live Location Tracking (Uber / Lyft Driver GPS)

Overview

Live location tracking ingests continuous GPS updates from millions of mobile devices and serves current location data to nearby clients in real time. Uber tracks 5M+ active drivers globally during peak hours, each sending location updates every 4 seconds. Use cases: rider seeing driver approach (ETA, map display), dispatch finding nearest available drivers, surge pricing computation, and fleet analytics.

Location Update Ingestion

Each active driver sends a GPS coordinate (lat, lng, accuracy, heading, speed, timestamp) every 4 seconds. At 5M drivers × 1 update/4s = 1.25M updates/second. This write volume exceeds what a traditional relational database can handle. Pipeline:

  1. Driver mobile app sends UDP or WebSocket message to a geographically closest location update endpoint (Anycast routing or GeoDNS).
  2. Endpoint publishes the update to Kafka (topic: driver_location_updates). Kafka handles 1.25M messages/second across partitioned topics — partition by driver_id for ordering within a driver.
  3. A stream processor (Flink or Spark Streaming) consumes from Kafka and writes the latest location per driver to a fast key-value store.

Location Storage: Redis for Current State

For “where is driver X right now?” queries, only the latest location matters. Redis is ideal: SET driver:location:{driver_id} “{lat},{lng},{timestamp}” with TTL = 30 seconds (if a driver stops sending updates, their location expires automatically — indicates offline or trip ended).

For geospatial queries (“find all drivers within 5km of this rider”), Redis Geo commands store positions in a sorted set using Geohash encoding: GEOADD drivers_geo {lng} {lat} {driver_id}. GEORADIUS (or GEOSEARCH in Redis 6.2+) returns members within a radius: GEOSEARCH drivers_geo FROMMEMBER {rider_id} BYRADIUS 5 km ASC. This runs in O(N + log M) where N = matching members and M = total members. For 5M drivers, GEOSEARCH on a 5km radius typically returns 50-200 drivers in < 5ms.

Geohashing for Partitioning

A geohash encodes a lat/lng coordinate into a short alphanumeric string where common prefixes indicate geographic proximity. Geohash precision: 4 characters ≈ 40km × 20km cell; 6 characters ≈ 1.2km × 0.6km cell; 8 characters ≈ 40m × 20m cell. Partitioning by geohash prefix (first 4-6 characters) routes queries to the server responsible for that geographic region. Adjacent cells share a common prefix — nearby lookups go to the same or adjacent servers. Challenge: geohash boundaries can create “edge cases” where drivers just across a boundary are on a different server. Solution: always query the target cell plus its 8 neighbors (3×3 grid) to ensure completeness.

Alternative: Google S2 library uses a Hilbert space-filling curve to map Earth’s surface to 64-bit integers, providing better neighbor queries and adaptive cell sizes. H3 (Uber’s hexagonal grid) divides the Earth into hexagonal cells — hexagons have 6 equidistant neighbors (versus 4 for squares), improving routing and surge pricing calculations.

Matching Service: Finding Nearby Drivers

When a rider requests a ride: the matching service queries Redis GEOSEARCH for available (not on a trip, not offline) drivers within a 5km radius. Filters: minimum battery level, vehicle type (UberX vs UberXL), driver rating. Returns top-5 candidates sorted by ETA (proximity × road network travel time, estimated via pre-computed routing tables or lightweight OSRM request). The matching service dispatches an offer to the top candidate driver — if accepted, the match is confirmed. If rejected or no response within 5 seconds, offer goes to the next candidate.

ETA Computation

Straight-line distance is inaccurate for urban routing (blocked streets, one-ways, traffic). ETA uses: (1) road graph with edge weights from historical travel times, updated by real-time traffic data (GPS traces from all active drivers provide live speed observations on each road segment). (2) Pre-computed SSSP (Single Source Shortest Paths) from popular pickup zones using Dijkstra/A*. (3) For high-QPS matching (thousands of rides/minute), ETA is approximated with a lightweight ML model that takes (pickup_lat, pickup_lng, driver_lat, driver_lng, time_of_day, day_of_week) as features and outputs ETA in seconds — trained on historical actual ETAs. This avoids expensive graph traversal per match query.

Location History for Analytics

Redis stores only current location. For analytics (heatmaps, surge pricing zones, driver behavior analysis), location history is written to a time-series database or columnar store: Apache Parquet files in S3, partitioned by (date, hour, geohash_prefix), queried by Presto/Athena. Retention: raw 4-second GPS traces stored for 90 days; aggregated (per-minute averages per H3 cell) stored indefinitely for business intelligence.

Surge Pricing via Location Data

Surge pricing increases fares when demand exceeds supply in a geographic area. Computation: a Flink streaming job aggregates location data into H3 hexagonal cells (resolution 7, ~5km²/cell). For each cell, compute supply_count (available drivers in cell) and demand_proxy (ride requests in last 5 minutes from riders in cell). surge_multiplier = f(demand/supply ratio). If demand/supply > 2×, apply 1.5× surge. Updated every 30 seconds. Published to Redis (key per H3 cell) and served to the app for the price estimate shown to riders.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does Redis store and query geospatial data for ride-sharing applications?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Redis Geo commands store lat/lng coordinates in a sorted set using Geohash encoding. Each member’s position is encoded as a 52-bit integer Geohash and stored as the sorted set score. Key commands: GEOADD key longitude latitude member u2014 adds a location (O(log N)). GEODIST key member1 member2 [unit] u2014 distance between two members (O(log N)). GEOSEARCH key FROMMEMBER member BYRADIUS radius unit ASC COUNT 10 u2014 finds members within radius, sorted by distance (O(N + log M) where N = results, M = total members). For Uber’s driver tracking: each active driver’s position is stored via GEOADD active_drivers {lng} {lat} {driver_id}. When a rider requests a ride: GEOSEARCH active_drivers FROMLONLAT {rider_lng} {rider_lat} BYRADIUS 5 km ASC COUNT 20 returns up to 20 nearest drivers. The returned driver IDs are then used to fetch driver details (status, vehicle type, rating) from another Redis hash or database. Driver positions are updated every 4 seconds via GEOADD (which updates existing members). Offline/inactive drivers are removed with ZREM. The entire set of 5M active drivers fits comfortably in Redis memory (~200MB at ~40 bytes per driver entry).”
}
},
{
“@type”: “Question”,
“name”: “What is geohashing and how does it enable efficient location queries?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A geohash encodes a lat/lng pair into a short alphanumeric string by recursively dividing the Earth into 2 halves (first by longitude: left/right, then latitude: top/bottom) and encoding each division as a bit. After 30 bits (15 divisions of each axis), the string is base32 encoded. Key property: longer prefix = smaller geographic area; points with the same geohash prefix are geographically near each other. Precisions: 4 chars u2248 40km u00d7 20km; 6 chars u2248 1.2km u00d7 0.6km; 8 chars u2248 40m u00d7 20m. For ride-sharing: store each driver in a hash table keyed by their 6-character geohash prefix. Finding nearby drivers = look up the rider’s geohash prefix + 8 adjacent cells. Eight neighbors must be included to avoid missing drivers just across a cell boundary u2014 geohash cells don’t have uniform neighbor relationships (especially near poles). Algorithm: compute rider’s geohash prefix, enumerate the 8 surrounding cells, union all drivers from all 9 cells, filter by actual distance using Haversine formula. Limitations: geohash cells are rectangular (not equidistant in all directions); near the 180u00b0 meridian and poles, the encoding has distortions. Alternatives: S2 geometry library (spherical cells, better neighbor relationships) and H3 hexagonal grid (6 equidistant neighbors per cell).”
}
},
{
“@type”: “Question”,
“name”: “How does Uber compute ETA for matching drivers to riders at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “ETA computation is on the critical path of ride matching u2014 it must complete in < 100ms for all candidate drivers. Approaches in order of complexity: (1) Straight-line distance + average speed u2014 fastest but inaccurate (ignores road network, one-way streets). (2) Routing engine (OSRM/Valhalla) u2014 queries a pre-computed road graph with current traffic weights. Accurate but ~20-50ms per query, and matching queries thousands of (driver, rider) pairs per second at Uber scale. (3) Learned ETA model u2014 a gradient boosted model (LightGBM) trained on millions of historical trips takes features: pickup lat/lng, driver lat/lng, time of day, day of week, weather, historical traffic patterns for that corridor. Outputs ETA in milliseconds. Accuracy within 10% of actual trip time for 80% of queries. This approach handles Uber's full query volume (millions of estimates/second) since ML inference is much cheaper than graph traversal. (4) Hybrid u2014 use the ML model for initial matching, then confirm with OSRM for the selected driver. Uber also pre-computes approximate ETAs on a coarse grid: for popular pickup zones (airports, downtown areas), precompute ETAs from nearby H3 cells containing active drivers. Cache in Redis for instant lookup during surge periods."
}
}
]
}

  • Meta Interview Guide
  • Snap Interview Guide
  • Airbnb Interview Guide
  • DoorDash Interview Guide
  • Lyft Interview Guide
  • Companies That Ask This

    Scroll to Top