What Is a Fleet Management System?
A fleet management system tracks the real-time location and status of thousands of vehicles (trucks, delivery vans, rideshare cars). It enables dispatching, route optimization, driver monitoring, and delivery ETA computation. Examples: Uber’s dispatch system, FedEx tracking, Google Maps fleet tracking. Core challenges: high-frequency location ingestion, geospatial queries (find nearest driver), and real-time status propagation.
System Requirements
Functional
- Ingest GPS location updates from vehicles (every 5 seconds per vehicle)
- Find nearest available drivers to a pickup location
- Compute and update ETAs in real-time
- Track vehicle status: available, en_route, offline
- Display live vehicle positions on a map
- Store 90-day trip history for analytics
Non-Functional
- 1M active vehicles, 200K location updates/sec
- Nearest-driver query in under 100ms
- 90-day history: 1M vehicles * 86400/5 updates/day * 90 days ≈ 1.5 trillion rows → columnar storage
Location Ingestion Pipeline
Vehicle ──GPS update──► Kafka (vehicle_locations topic)
│
┌────────────────┼─────────────────┐
▼ ▼ ▼
Redis GeoSet Flink stream S3/Parquet
(current position) (ETA compute) (trip history)
Kafka partitioned by vehicle_id ensures in-order processing per vehicle. Each partition consumed by a location processor that updates Redis and triggers downstream jobs.
Real-Time Location Store: Redis GeoSet
Redis GEOADD command stores lat/lon as a sorted set with an internal geohash score. Operations:
GEOADD vehicles_active longitude latitude vehicle_id
GEORADIUS vehicles_active lng lat 5 km ASC COUNT 10
GEOPOS vehicles_active vehicle_id
GEODIST vehicles_active v1 v2 km
GEORADIUS returns vehicles within N km sorted by distance — perfect for nearest-driver queries. Time: O(N+log M) where N is results returned and M is total vehicles in the set. For 1M vehicles and a 5km radius query in a dense city, M might be 50K — still fast.
Sharding Redis GeoSets
Single Redis GeoSet can hold all 1M vehicles. But for 200K writes/sec (each vehicle 5s → 200K/s), a single Redis node saturates at ~100K ops/sec. Solution: shard by geohash prefix. Divide the world into a 4×4 grid of cells (16 cells). Each vehicle belongs to one cell based on its current coordinates. Route writes and queries to the appropriate shard. Queries near cell boundaries must fan out to adjacent cells — handle with a boundary check. Geohash ensures nearby vehicles share the same shard prefix.
Nearest Driver Query
def find_nearest_drivers(pickup_lat, pickup_lon, radius_km=5, limit=10):
cell = geohash_cell(pickup_lat, pickup_lon)
adjacent_cells = get_neighbors(cell)
results = []
for c in [cell] + adjacent_cells:
redis_shard = get_shard(c)
drivers = redis_shard.georadius(
f'vehicles:{c}', pickup_lon, pickup_lat,
radius_km, unit='km', sort='ASC', count=limit
)
results.extend(drivers)
results.sort(key=lambda d: d.distance)
return results[:limit]
ETA Computation
ETA has two components: routing time (map graph shortest path from driver to pickup to destination) and external factors (traffic, time of day, historical congestion). At scale: use precomputed routing tiles + real-time traffic overlays. For interviews: describe Dijkstra on a road graph with time-of-day weighted edges. Update ETAs every 30 seconds as traffic changes.
Trip History Storage
1.5 trillion location rows requires columnar storage. Schema in Apache Parquet on S3, partitioned by (vehicle_id, date): fast retrieval per vehicle per day. Query engine: AWS Athena or Spark SQL. Compaction: small Parquet files from streaming are compacted daily into larger files for efficient analytics.
Interview Tips
- Redis GeoSet + GEORADIUS is the canonical answer for nearest-vehicle queries.
- 200K writes/sec requires sharding — lead with geohash-based sharding.
- Separate hot path (Redis, real-time) from cold path (S3, analytics) explicitly.
- ETA = routing + traffic — acknowledge both components.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does Redis GEORADIUS work and why is it ideal for nearest-vehicle queries?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Redis GeoSet stores geographic coordinates as a sorted set where the score is a 52-bit geohash encoding of the (longitude, latitude) pair. GEOADD stores a member with its location in O(log N). GEORADIUS performs a circular range query: it determines which geohash cells intersect the search radius, retrieves all members from those cells, then filters by actual distance. Time complexity: O(N+log M) where N is the number of results and M is the total members. For a 5km radius in a city of 50K vehicles, this returns in under 5ms. The key advantage over SQL: no full table scan. SQL spatial indexes (PostGIS) are also fast for this use case, but Redis GeoSet wins on throughput — a single Redis node handles ~100K GEORADIUS queries/sec vs ~10K for PostgreSQL under write load. Use Redis for real-time nearest-driver queries; use PostGIS for complex geospatial analytics (trip heatmaps, polygon containment).” }
},
{
“@type”: “Question”,
“name”: “How do you handle 200,000 vehicle location updates per second?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “A single Redis node handles ~100K writes/sec. For 200K/sec: shard by geohash. Divide the map into a grid of cells (e.g., 16 cells using geohash precision 1). Each vehicle update goes to the Redis shard that owns its current cell. When the vehicle crosses a cell boundary, update the old shard (ZREM) and the new shard (GEOADD). Nearest-vehicle queries must check the target cell and its neighbors (up to 8 adjacent cells = 9 total), fanning out to at most 9 shards. In practice, a 5km radius usually touches 1-4 shards. For even higher throughput: add a write buffer. Accept GPS updates in Kafka (partitioned by vehicle_id for ordering). A fleet of processors reads from Kafka and batches GEOADD commands using Redis pipelining — send 100 commands in one network round trip. Pipelining increases throughput 10x over individual commands.” }
},
{
“@type”: “Question”,
“name”: “How do you store and query 90 days of vehicle location history at petabyte scale?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “With 1M vehicles updating every 5 seconds, location history is 1M * 17,280 updates/day = 17.28 billion rows/day. At 50 bytes/row (vehicle_id, lat, lon, timestamp, speed, heading), that is 864 GB/day or ~78 TB over 90 days. Relational databases cannot handle this. Solution: columnar storage on object storage. Write location updates to Kafka, Flink aggregates into 1-minute micro-batches and writes Parquet files to S3, partitioned by vehicle_id/date. Parquet columnar format compresses location data 5-10x (repeated lat/lon values for stationary vehicles). Query via AWS Athena or Apache Spark. For per-vehicle trip reconstruction: query WHERE vehicle_id = X AND date BETWEEN Y AND Z — efficient because S3 partitioning limits scan to relevant files. For analytics (heatmaps, congestion zones): Spark aggregation over all files for a day. Keep Redis only for current position (TTL 30 minutes if no update = vehicle offline).” }
}
]
}