What Is a Location-Based Services System?
A location-based services (LBS) system enables geospatial queries: find nearby restaurants, show points of interest on a map, compute distances, and provide routing. Examples: Yelp (nearby search), Google Maps Places API, Foursquare. Core challenges: geospatial indexing for efficient proximity queries, handling moving objects (vehicles vs static POIs), and global scale (billions of GPS coordinates).
System Requirements
Functional
- Add/update/delete locations (businesses, POIs)
- Search nearby: find N closest locations within radius R of a point
- Filter: by category, rating, open hours
- Geofencing: detect when a device enters/exits a defined area
Non-Functional
- 500M locations globally, 100K searches/second
- Search latency <100ms
- 1M location updates/second (for moving objects)
Geospatial Indexing: Geohash
Geohash divides the world into a grid of cells using a base-32 encoded string. Longer hash = smaller cell. Precision: 6 chars ≈ 1.2km x 0.6km; 7 chars ≈ 153m x 153m. Properties: locations with the same geohash prefix are geographically close (mostly — boundary edge cases exist). Nearby cells share long common prefixes.
import geohash2
# Encode
hash = geohash2.encode(lat, lon, precision=7) # e.g., "9q8yy"
# Search nearby: get the target cell + 8 neighbors
neighbors = geohash2.neighbors(target_hash) # 8 surrounding cells
search_cells = [target_hash] + neighbors
# DB query
SELECT * FROM locations
WHERE geohash LIKE 'dp3wj%' # all locations in cell
AND category = 'restaurant'
ORDER BY distance(lat, lon, ?, ?)
LIMIT 20
DB index on geohash prefix enables O(log N) range queries per cell. The 9-cell search (target + 8 neighbors) handles boundary cases where the nearest location is in an adjacent cell.
Alternative: Quadtree
A quadtree recursively subdivides a 2D area into four quadrants until each leaf contains fewer than K points (e.g., K=100). Pros: adaptive density — dense cities get finer subdivision, sparse areas get coarser. Cons: complex updates (rebalancing on insert/delete), harder to distribute. Geohash is preferred for DB-backed systems; quadtree for in-memory spatial indexes (game engines, GIS tools).
Data Model
locations: id, name, lat, lon, geohash7, category, rating,
hours_json, address, created_at
location_tags: location_id, tag
-- Indexes:
CREATE INDEX ON locations (geohash7, category);
CREATE INDEX ON locations (geohash7, rating DESC);
Search Flow
- Compute target geohash at precision 7 (153m cell)
- Get 8 neighbors
- Query DB: WHERE geohash7 IN (9 cells) AND category = ? LIMIT 100
- Compute exact distances (Haversine formula) for each result
- Sort by distance, apply rating/filter, return top 20
- If fewer than 20 results: expand to precision-6 cells (1.2km) and repeat
Caching
Search results are highly cacheable: same query (location, radius, category) returns the same results for minutes. Cache key: “{geohash6}:{category}:{radius}”. TTL: 60 seconds for static POIs (restaurants), 5 seconds for dynamic (drivers, riders). Redis stores the result set as a JSON array. Cache invalidation: when a location is added/deleted, invalidate all cells that overlap with it.
Geofencing
Detect when a device enters or exits a polygon. Storage: store geofences as GeoJSON polygons in PostGIS. Point-in-polygon check: ST_Contains(geofence_polygon, ST_Point(lat, lon)). For 1M device updates/second: use geohash to pre-filter — only check geofences whose bounding box overlaps the device’s geohash cell. This reduces polygon checks from all geofences to the few dozen in the local area.
Interview Tips
- Geohash + 9-cell neighbor search is the standard nearby-POI solution.
- Haversine formula for great-circle distance — O(1), no index needed for final sort.
- Adaptive radius expansion (precision 7 → 6 → 5) handles sparse areas.
- PostGIS for complex geofencing; Redis GeoSet for simple radius queries.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does Geohash indexing enable efficient proximity searches?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Geohash encodes a (lat, lng) coordinate into a short alphanumeric string by recursively subdividing the world into cells. Characters share a common prefix if they are geographically close. A 6-character geohash covers roughly a 1.2km x 0.6km cell; 7 characters cover ~150m x 75m. For proximity search: compute the geohash of the query point, compute its 8 neighbors (N, NE, E, SE, S, SW, W, NW) using standard neighbor algorithms, then query: WHERE geohash LIKE {prefix}% OR geohash IN ({8 neighbors}). This is efficient with a B-tree index on geohash. The edge case: a target just outside the search radius may be in a neighboring cell not queried, and a target just inside the radius may be far in the same cell. Always apply a secondary Haversine distance filter on the candidates returned by the geohash query to get exact results.” }
},
{
“@type”: “Question”,
“name”: “What are the trade-offs between Geohash and Quadtree for spatial indexing?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Geohash: fixed-grid cell decomposition. Encodes coordinates to a string; stored in standard SQL/NoSQL indexes. Simple to implement. Problem: cells are fixed size — high-density areas (Manhattan) and low-density areas (Nevada desert) get the same cell size, causing hot cells with thousands of businesses vs. cold cells with none. Quadtree: recursive spatial decomposition that adapts to data density. A cell is split only when it exceeds a threshold (e.g., 50 businesses per node). Results in balanced tree depth regardless of geographic density. Best for: driver tracking systems where drivers cluster in cities, Yelp-style local search in mixed urban/rural areas. Trade-off: Quadtree requires an in-memory or distributed tree structure, not a simple string index. Geohash is simpler and works well when data density is reasonably uniform. Quadtrees handle hot-spot density skew better.” }
},
{
“@type”: “Question”,
“name”: “How do you handle the geofencing use case at scale?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Geofencing: trigger an event when a moving entity (driver, user) enters or exits a defined polygon (delivery zone, restricted area). At scale (1M geofences, 100K moving entities): (1) Spatial index all geofences with R-tree or PostGIS GIST index. Index geofence bounding boxes first for coarse filtering, then do precise polygon containment check only on candidates. (2) For real-time entity movement: each entity reports position every 5 seconds. Compute which geofences contain the new position. Compare against which geofences contained the previous position. Difference = enter/exit events. (3) At 100K entities * 5-second updates = 20K position updates/second. Use a Redis geospatial set (GEOADD) for fast neighbor lookup, batch geofence checks, and publish enter/exit events to Kafka for downstream processing (notifications, billing, analytics).” }
}
]
}