Overview
Live location tracking ingests continuous GPS updates from millions of mobile devices and serves current location data to nearby clients in real time. Uber tracks 5M+ active drivers globally during peak hours, each sending location updates every 4 seconds. Use cases: rider seeing driver approach (ETA, map display), dispatch finding nearest available drivers, surge pricing computation, and fleet analytics.
Location Update Ingestion
Each active driver sends a GPS coordinate (lat, lng, accuracy, heading, speed, timestamp) every 4 seconds. At 5M drivers × 1 update/4s = 1.25M updates/second. This write volume exceeds what a traditional relational database can handle. Pipeline:
- Driver mobile app sends UDP or WebSocket message to a geographically closest location update endpoint (Anycast routing or GeoDNS).
- Endpoint publishes the update to Kafka (topic: driver_location_updates). Kafka handles 1.25M messages/second across partitioned topics — partition by driver_id for ordering within a driver.
- A stream processor (Flink or Spark Streaming) consumes from Kafka and writes the latest location per driver to a fast key-value store.
Location Storage: Redis for Current State
For “where is driver X right now?” queries, only the latest location matters. Redis is ideal: SET driver:location:{driver_id} “{lat},{lng},{timestamp}” with TTL = 30 seconds (if a driver stops sending updates, their location expires automatically — indicates offline or trip ended).
For geospatial queries (“find all drivers within 5km of this rider”), Redis Geo commands store positions in a sorted set using Geohash encoding: GEOADD drivers_geo {lng} {lat} {driver_id}. GEORADIUS (or GEOSEARCH in Redis 6.2+) returns members within a radius: GEOSEARCH drivers_geo FROMMEMBER {rider_id} BYRADIUS 5 km ASC. This runs in O(N + log M) where N = matching members and M = total members. For 5M drivers, GEOSEARCH on a 5km radius typically returns 50-200 drivers in < 5ms.
Geohashing for Partitioning
A geohash encodes a lat/lng coordinate into a short alphanumeric string where common prefixes indicate geographic proximity. Geohash precision: 4 characters ≈ 40km × 20km cell; 6 characters ≈ 1.2km × 0.6km cell; 8 characters ≈ 40m × 20m cell. Partitioning by geohash prefix (first 4-6 characters) routes queries to the server responsible for that geographic region. Adjacent cells share a common prefix — nearby lookups go to the same or adjacent servers. Challenge: geohash boundaries can create “edge cases” where drivers just across a boundary are on a different server. Solution: always query the target cell plus its 8 neighbors (3×3 grid) to ensure completeness.
Alternative: Google S2 library uses a Hilbert space-filling curve to map Earth’s surface to 64-bit integers, providing better neighbor queries and adaptive cell sizes. H3 (Uber’s hexagonal grid) divides the Earth into hexagonal cells — hexagons have 6 equidistant neighbors (versus 4 for squares), improving routing and surge pricing calculations.
Matching Service: Finding Nearby Drivers
When a rider requests a ride: the matching service queries Redis GEOSEARCH for available (not on a trip, not offline) drivers within a 5km radius. Filters: minimum battery level, vehicle type (UberX vs UberXL), driver rating. Returns top-5 candidates sorted by ETA (proximity × road network travel time, estimated via pre-computed routing tables or lightweight OSRM request). The matching service dispatches an offer to the top candidate driver — if accepted, the match is confirmed. If rejected or no response within 5 seconds, offer goes to the next candidate.
ETA Computation
Straight-line distance is inaccurate for urban routing (blocked streets, one-ways, traffic). ETA uses: (1) road graph with edge weights from historical travel times, updated by real-time traffic data (GPS traces from all active drivers provide live speed observations on each road segment). (2) Pre-computed SSSP (Single Source Shortest Paths) from popular pickup zones using Dijkstra/A*. (3) For high-QPS matching (thousands of rides/minute), ETA is approximated with a lightweight ML model that takes (pickup_lat, pickup_lng, driver_lat, driver_lng, time_of_day, day_of_week) as features and outputs ETA in seconds — trained on historical actual ETAs. This avoids expensive graph traversal per match query.
Location History for Analytics
Redis stores only current location. For analytics (heatmaps, surge pricing zones, driver behavior analysis), location history is written to a time-series database or columnar store: Apache Parquet files in S3, partitioned by (date, hour, geohash_prefix), queried by Presto/Athena. Retention: raw 4-second GPS traces stored for 90 days; aggregated (per-minute averages per H3 cell) stored indefinitely for business intelligence.
Surge Pricing via Location Data
Surge pricing increases fares when demand exceeds supply in a geographic area. Computation: a Flink streaming job aggregates location data into H3 hexagonal cells (resolution 7, ~5km²/cell). For each cell, compute supply_count (available drivers in cell) and demand_proxy (ride requests in last 5 minutes from riders in cell). surge_multiplier = f(demand/supply ratio). If demand/supply > 2×, apply 1.5× surge. Updated every 30 seconds. Published to Redis (key per H3 cell) and served to the app for the price estimate shown to riders.