What Is a Geo-Proximity Service?
A geo-proximity service finds entities (businesses, drivers, users) near a given location. Core operations: store a location, find all entities within radius R of a point, find the K nearest entities. Used by: Yelp (nearby restaurants), Uber (nearby drivers), DoorDash (nearby restaurants), Airbnb (nearby listings).
Data Model
Entity table: entity_id, entity_type, name, latitude, longitude, is_active. Indexes: spatial index on (latitude, longitude) or geohash column. For drivers (frequently updating location): a separate driver_locations table or Redis GEO (in-memory, low latency for reads/writes).
Geohashing
Geohash encodes a (lat, lng) pair into a short string. The encoding divides the world into a grid recursively: split into 4 quadrants, encode each split as a bit. Longer strings = smaller cells = higher precision. At precision 6 (e.g., “9q5c2n”): cell is about 1.2km x 0.6km. At precision 8: 38m x 19m.
For radius search: find the geohash of the center point. Get all neighboring cells (the 8 surrounding geohashes). Query all entities whose geohash starts with any of the 9 prefixes. This gives an approximate radius search with no false negatives but some false positives (cells overlap with the radius — filter those out with Haversine distance).
Store geohash as a VARCHAR with an index. For a 5km radius, use precision 5 cells (~5km x 5km). Adjust precision to match your typical query radius.
Alternative: H3 Hexagonal Grid (Uber)
H3 is Uber open-source hexagonal grid system. Hexagons tile more uniformly than squares (equal distance to all 6 neighbors vs rectangular grids). H3 resolution 9 cells are ~0.1km^2. Each cell has a 64-bit integer ID. Spatial queries: index on H3 cell ID. For radius search, enumerate all cells within the radius using H3 disk/ring functions and query each. H3 is used by Uber for surge pricing zones, demand mapping, and driver dispatch.
Redis GEO Commands
Redis provides native geo commands using a geohash-based sorted set internally:
- GEOADD key lng lat member: add/update a location. O(log N).
- GEODIST key m1 m2 unit: distance between two members. O(log N).
- GEORADIUS key lng lat radius unit [ASC] [COUNT N]: all members within radius, sorted by distance. O(N+log M).
- GEORADIUSBYMEMBER key member radius unit: radius search centered on an existing member.
Best for: frequently updating locations (drivers, delivery couriers). At 1M active drivers updating every 4 seconds: 250K GEOADD/second — requires Redis Cluster. Reads (GEORADIUS for nearest drivers) are O(N) for returned results and extremely fast from RAM.
PostGIS for Persistent Geo Data
PostgreSQL with PostGIS extension provides full geospatial support: ST_DWithin(location, target, meters) uses a spatial index (R-tree via GIST) for fast radius queries. Better than Redis for: persistent data (businesses, venues), complex spatial queries (polygon containment, route intersection), join with other tables. Typical query: SELECT * FROM venues WHERE ST_DWithin(location, ST_MakePoint(lng, lat)::geography, 5000) ORDER BY location ST_MakePoint(lng, lat) LIMIT 20.
System Architecture
For a Yelp-like service: business locations in PostGIS (infrequent writes, complex queries). For a driver-tracking service: driver locations in Redis GEO (frequent writes, simple radius queries). Serve geo queries through a dedicated Location Service — do not expose the database directly. Cache popular searches (nearby restaurants in a geohash cell) in Redis with a 30-second TTL. Use CDN for static location data (venue metadata) once fetched.
Dealing with Edge Cases
Anti-meridian (longitude wrapping at 180/-180): geohash handles this, but custom lat/lng queries need special handling. Poles: high latitudes cause distortion in Mercator projections — use sphere-based distance (Haversine). Precision selection: match geohash precision to your typical query radius. If users commonly search within 1km, use precision 6 (cells are ~1.2km). Too coarse = too many false positives; too fine = might miss entities in neighboring cells.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does geohashing work for proximity searches?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Geohash encodes (latitude, longitude) into a short alphanumeric string by recursively subdividing the world into a grid. Each character adds precision: a 6-character geohash represents a cell of about 1.2km x 0.6km. For a radius search: hash the center point to get a geohash, then enumerate the 8 neighboring cells (same precision). Query all entities whose geohash starts with any of the 9 cell prefixes. This returns all entities in those cells — apply Haversine distance to filter to the exact radius. The prefix index on geohash makes these queries fast (B-tree index). Limitation: cells near the anti-meridian (180 longitude) or poles have distorted shapes — use Haversine for final distance verification.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between Redis GEO and PostGIS for location storage?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Redis GEO stores locations in a sorted set using geohash internally. Best for high write throughput (drivers updating location every 4 seconds) and fast in-memory reads. GEORADIUS returns nearby members in O(N+log M). Data is not durably persisted by default — use RDB snapshots or AOF. PostGIS is a PostgreSQL extension that adds spatial types (GEOMETRY, GEOGRAPHY) and spatial indexes (R-tree via GiST). Best for: durable persistent location data (businesses, venues), complex spatial queries (polygon containment, routing), and joining with other relational data. ST_DWithin uses the spatial index for fast radius queries. For ride-sharing: Redis for driver locations (high write, ephemeral), PostGIS for venue/business data (low write, permanent).”
}
},
{
“@type”: “Question”,
“name”: “How does Uber H3 hexagonal grid differ from square geohash grids?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “H3 uses a hexagonal grid rather than rectangular cells. Hexagons have uniform distance to all 6 neighbors (vs rectangular grids where corner neighbors are farther than edge neighbors). This makes hexagons better for visualizing geographic distributions and computing neighbor sets. H3 cells have 16 resolutions: resolution 9 cells are about 0.1 km^2 (about 174m diameter). Each cell has a unique 64-bit integer ID. Disk and ring functions enumerate cells within a given distance efficiently. Uber uses H3 for surge pricing zones (each zone is a set of cells), demand heatmaps, and supply/demand analytics. For storage, index on the H3 cell ID integer column using a B-tree.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle provider location updates at scale (1M active drivers)?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “1M drivers updating every 4 seconds = 250,000 writes/second. Redis Cluster sharded by driver_id handles this. Each driver update: GEOADD {shard_key} lng lat driver_id. On trip request: GEORADIUS {shard_key} lng lat 5 km ASC COUNT 20 — but this only queries one shard. Solution: either shard by geo-region (all drivers in city X on shard Y), or maintain a city-level index that maps driver_id to city and route GEORADIUS to the correct shard. For global services: region-based sharding (North America on cluster A, Europe on cluster B) keeps the data close to the drivers and the query servers that serve them. Geo-region sharding also reduces cross-datacenter latency.”
}
},
{
“@type”: “Question”,
“name”: “How do you compute the distance between two geographic coordinates?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The Haversine formula computes the great-circle distance between two points on a sphere. It accounts for the curvature of the Earth. Formula: a = sin^2(delta_lat/2) + cos(lat1)*cos(lat2)*sin^2(delta_lng/2); c = 2*atan2(sqrt(a), sqrt(1-a)); distance = R*c where R = 6371 km. For short distances (< 50km), the flat-earth approximation is acceptable and cheaper to compute. For databases: ST_Distance (PostGIS) uses the spheroid for maximum accuracy. Redis GEODIST uses the Haversine formula internally. In application code, use a library (geopy, h3) rather than hand-rolling Haversine — edge cases near the poles and anti-meridian are tricky to handle correctly."
}
}
]
}
Asked at: Uber Interview Guide
Asked at: Lyft Interview Guide
Asked at: DoorDash Interview Guide
Asked at: Airbnb Interview Guide
Asked at: Snap Interview Guide