What Is a Fleet Management System?
A fleet management system tracks the real-time location and status of thousands of vehicles (trucks, delivery vans, rideshare cars). It enables dispatching, route optimization, driver monitoring, and delivery ETA computation. Examples: Uber’s dispatch system, FedEx tracking, Google Maps fleet tracking. Core challenges: high-frequency location ingestion, geospatial queries (find nearest driver), and real-time status propagation.
System Requirements
Functional
- Ingest GPS location updates from vehicles (every 5 seconds per vehicle)
- Find nearest available drivers to a pickup location
- Compute and update ETAs in real-time
- Track vehicle status: available, en_route, offline
- Display live vehicle positions on a map
- Store 90-day trip history for analytics
Non-Functional
- 1M active vehicles, 200K location updates/sec
- Nearest-driver query in under 100ms
- 90-day history: 1M vehicles * 86400/5 updates/day * 90 days ≈ 1.5 trillion rows → columnar storage
Location Ingestion Pipeline
Vehicle ──GPS update──► Kafka (vehicle_locations topic)
│
┌────────────────┼─────────────────┐
▼ ▼ ▼
Redis GeoSet Flink stream S3/Parquet
(current position) (ETA compute) (trip history)
Kafka partitioned by vehicle_id ensures in-order processing per vehicle. Each partition consumed by a location processor that updates Redis and triggers downstream jobs.
Real-Time Location Store: Redis GeoSet
Redis GEOADD command stores lat/lon as a sorted set with an internal geohash score. Operations:
GEOADD vehicles_active longitude latitude vehicle_id
GEORADIUS vehicles_active lng lat 5 km ASC COUNT 10
GEOPOS vehicles_active vehicle_id
GEODIST vehicles_active v1 v2 km
GEORADIUS returns vehicles within N km sorted by distance — perfect for nearest-driver queries. Time: O(N+log M) where N is results returned and M is total vehicles in the set. For 1M vehicles and a 5km radius query in a dense city, M might be 50K — still fast.
Sharding Redis GeoSets
Single Redis GeoSet can hold all 1M vehicles. But for 200K writes/sec (each vehicle 5s → 200K/s), a single Redis node saturates at ~100K ops/sec. Solution: shard by geohash prefix. Divide the world into a 4×4 grid of cells (16 cells). Each vehicle belongs to one cell based on its current coordinates. Route writes and queries to the appropriate shard. Queries near cell boundaries must fan out to adjacent cells — handle with a boundary check. Geohash ensures nearby vehicles share the same shard prefix.
Nearest Driver Query
def find_nearest_drivers(pickup_lat, pickup_lon, radius_km=5, limit=10):
cell = geohash_cell(pickup_lat, pickup_lon)
adjacent_cells = get_neighbors(cell)
results = []
for c in [cell] + adjacent_cells:
redis_shard = get_shard(c)
drivers = redis_shard.georadius(
f'vehicles:{c}', pickup_lon, pickup_lat,
radius_km, unit='km', sort='ASC', count=limit
)
results.extend(drivers)
results.sort(key=lambda d: d.distance)
return results[:limit]
ETA Computation
ETA has two components: routing time (map graph shortest path from driver to pickup to destination) and external factors (traffic, time of day, historical congestion). At scale: use precomputed routing tiles + real-time traffic overlays. For interviews: describe Dijkstra on a road graph with time-of-day weighted edges. Update ETAs every 30 seconds as traffic changes.
Trip History Storage
1.5 trillion location rows requires columnar storage. Schema in Apache Parquet on S3, partitioned by (vehicle_id, date): fast retrieval per vehicle per day. Query engine: AWS Athena or Spark SQL. Compaction: small Parquet files from streaming are compacted daily into larger files for efficient analytics.
Interview Tips
- Redis GeoSet + GEORADIUS is the canonical answer for nearest-vehicle queries.
- 200K writes/sec requires sharding — lead with geohash-based sharding.
- Separate hot path (Redis, real-time) from cold path (S3, analytics) explicitly.
- ETA = routing + traffic — acknowledge both components.