Question 1

How does Redis GEORADIUS work and why is it ideal for nearest-vehicle queries?

Accepted Answer

Redis GeoSet stores geographic coordinates as a sorted set where the score is a 52-bit geohash encoding of the (longitude, latitude) pair. GEOADD stores a member with its location in O(log N). GEORADIUS performs a circular range query: it determines which geohash cells intersect the search radius, retrieves all members from those cells, then filters by actual distance. Time complexity: O(N+log M) where N is the number of results and M is the total members. For a 5km radius in a city of 50K vehicles, this returns in under 5ms. The key advantage over SQL: no full table scan. SQL spatial indexes (PostGIS) are also fast for this use case, but Redis GeoSet wins on throughput — a single Redis node handles ~100K GEORADIUS queries/sec vs ~10K for PostgreSQL under write load. Use Redis for real-time nearest-driver queries; use PostGIS for complex geospatial analytics (trip heatmaps, polygon containment).

Question 2

How do you handle 200,000 vehicle location updates per second?

Accepted Answer

A single Redis node handles ~100K writes/sec. For 200K/sec: shard by geohash. Divide the map into a grid of cells (e.g., 16 cells using geohash precision 1). Each vehicle update goes to the Redis shard that owns its current cell. When the vehicle crosses a cell boundary, update the old shard (ZREM) and the new shard (GEOADD). Nearest-vehicle queries must check the target cell and its neighbors (up to 8 adjacent cells = 9 total), fanning out to at most 9 shards. In practice, a 5km radius usually touches 1-4 shards. For even higher throughput: add a write buffer. Accept GPS updates in Kafka (partitioned by vehicle_id for ordering). A fleet of processors reads from Kafka and batches GEOADD commands using Redis pipelining — send 100 commands in one network round trip. Pipelining increases throughput 10x over individual commands.

Question 3

How do you store and query 90 days of vehicle location history at petabyte scale?

Accepted Answer

With 1M vehicles updating every 5 seconds, location history is 1M * 17,280 updates/day = 17.28 billion rows/day. At 50 bytes/row (vehicle_id, lat, lon, timestamp, speed, heading), that is 864 GB/day or ~78 TB over 90 days. Relational databases cannot handle this. Solution: columnar storage on object storage. Write location updates to Kafka, Flink aggregates into 1-minute micro-batches and writes Parquet files to S3, partitioned by vehicle_id/date. Parquet columnar format compresses location data 5-10x (repeated lat/lon values for stationary vehicles). Query via AWS Athena or Apache Spark. For per-vehicle trip reconstruction: query WHERE vehicle_id = X AND date BETWEEN Y AND Z — efficient because S3 partitioning limits scan to relevant files. For analytics (heatmaps, congestion zones): Spark aggregation over all files for a day. Keep Redis only for current position (TTL 30 minutes if no update = vehicle offline).

System Design Interview: Design a Fleet Management and Vehicle Tracking System

What Is a Fleet Management System?

System Requirements

Functional

Non-Functional

Location Ingestion Pipeline

Real-Time Location Store: Redis GeoSet

Sharding Redis GeoSets

Nearest Driver Query

ETA Computation

Trip History Storage

Interview Tips