System Design Interview: Design a Location-Based Services System (Yelp/Google Maps)

What Is a Location-Based Services System?

A location-based services (LBS) system enables geospatial queries: find nearby restaurants, show points of interest on a map, compute distances, and provide routing. Examples: Yelp (nearby search), Google Maps Places API, Foursquare. Core challenges: geospatial indexing for efficient proximity queries, handling moving objects (vehicles vs static POIs), and global scale (billions of GPS coordinates).

  • LinkedIn Interview Guide
  • Snap Interview Guide
  • Airbnb Interview Guide
  • DoorDash Interview Guide
  • Lyft Interview Guide
  • Uber Interview Guide
  • System Requirements

    Functional

    • Add/update/delete locations (businesses, POIs)
    • Search nearby: find N closest locations within radius R of a point
    • Filter: by category, rating, open hours
    • Geofencing: detect when a device enters/exits a defined area

    Non-Functional

    • 500M locations globally, 100K searches/second
    • Search latency <100ms
    • 1M location updates/second (for moving objects)

    Geospatial Indexing: Geohash

    Geohash divides the world into a grid of cells using a base-32 encoded string. Longer hash = smaller cell. Precision: 6 chars ≈ 1.2km x 0.6km; 7 chars ≈ 153m x 153m. Properties: locations with the same geohash prefix are geographically close (mostly — boundary edge cases exist). Nearby cells share long common prefixes.

    import geohash2
    
    # Encode
    hash = geohash2.encode(lat, lon, precision=7)  # e.g., "9q8yy"
    
    # Search nearby: get the target cell + 8 neighbors
    neighbors = geohash2.neighbors(target_hash)  # 8 surrounding cells
    search_cells = [target_hash] + neighbors
    
    # DB query
    SELECT * FROM locations
    WHERE geohash LIKE 'dp3wj%'  # all locations in cell
    AND category = 'restaurant'
    ORDER BY distance(lat, lon, ?, ?)
    LIMIT 20
    

    DB index on geohash prefix enables O(log N) range queries per cell. The 9-cell search (target + 8 neighbors) handles boundary cases where the nearest location is in an adjacent cell.

    Alternative: Quadtree

    A quadtree recursively subdivides a 2D area into four quadrants until each leaf contains fewer than K points (e.g., K=100). Pros: adaptive density — dense cities get finer subdivision, sparse areas get coarser. Cons: complex updates (rebalancing on insert/delete), harder to distribute. Geohash is preferred for DB-backed systems; quadtree for in-memory spatial indexes (game engines, GIS tools).

    Data Model

    locations: id, name, lat, lon, geohash7, category, rating,
               hours_json, address, created_at
    location_tags: location_id, tag
    
    -- Indexes:
    CREATE INDEX ON locations (geohash7, category);
    CREATE INDEX ON locations (geohash7, rating DESC);
    

    Search Flow

    1. Compute target geohash at precision 7 (153m cell)
    2. Get 8 neighbors
    3. Query DB: WHERE geohash7 IN (9 cells) AND category = ? LIMIT 100
    4. Compute exact distances (Haversine formula) for each result
    5. Sort by distance, apply rating/filter, return top 20
    6. If fewer than 20 results: expand to precision-6 cells (1.2km) and repeat

    Caching

    Search results are highly cacheable: same query (location, radius, category) returns the same results for minutes. Cache key: “{geohash6}:{category}:{radius}”. TTL: 60 seconds for static POIs (restaurants), 5 seconds for dynamic (drivers, riders). Redis stores the result set as a JSON array. Cache invalidation: when a location is added/deleted, invalidate all cells that overlap with it.

    Geofencing

    Detect when a device enters or exits a polygon. Storage: store geofences as GeoJSON polygons in PostGIS. Point-in-polygon check: ST_Contains(geofence_polygon, ST_Point(lat, lon)). For 1M device updates/second: use geohash to pre-filter — only check geofences whose bounding box overlaps the device’s geohash cell. This reduces polygon checks from all geofences to the few dozen in the local area.

    Interview Tips

    • Geohash + 9-cell neighbor search is the standard nearby-POI solution.
    • Haversine formula for great-circle distance — O(1), no index needed for final sort.
    • Adaptive radius expansion (precision 7 → 6 → 5) handles sparse areas.
    • PostGIS for complex geofencing; Redis GeoSet for simple radius queries.

    {
    “@context”: “https://schema.org”,
    “@type”: “FAQPage”,
    “mainEntity”: [
    {
    “@type”: “Question”,
    “name”: “How does Geohash indexing enable efficient proximity searches?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Geohash encodes a (lat, lng) coordinate into a short alphanumeric string by recursively subdividing the world into cells. Characters share a common prefix if they are geographically close. A 6-character geohash covers roughly a 1.2km x 0.6km cell; 7 characters cover ~150m x 75m. For proximity search: compute the geohash of the query point, compute its 8 neighbors (N, NE, E, SE, S, SW, W, NW) using standard neighbor algorithms, then query: WHERE geohash LIKE {prefix}% OR geohash IN ({8 neighbors}). This is efficient with a B-tree index on geohash. The edge case: a target just outside the search radius may be in a neighboring cell not queried, and a target just inside the radius may be far in the same cell. Always apply a secondary Haversine distance filter on the candidates returned by the geohash query to get exact results.” }
    },
    {
    “@type”: “Question”,
    “name”: “What are the trade-offs between Geohash and Quadtree for spatial indexing?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Geohash: fixed-grid cell decomposition. Encodes coordinates to a string; stored in standard SQL/NoSQL indexes. Simple to implement. Problem: cells are fixed size — high-density areas (Manhattan) and low-density areas (Nevada desert) get the same cell size, causing hot cells with thousands of businesses vs. cold cells with none. Quadtree: recursive spatial decomposition that adapts to data density. A cell is split only when it exceeds a threshold (e.g., 50 businesses per node). Results in balanced tree depth regardless of geographic density. Best for: driver tracking systems where drivers cluster in cities, Yelp-style local search in mixed urban/rural areas. Trade-off: Quadtree requires an in-memory or distributed tree structure, not a simple string index. Geohash is simpler and works well when data density is reasonably uniform. Quadtrees handle hot-spot density skew better.” }
    },
    {
    “@type”: “Question”,
    “name”: “How do you handle the geofencing use case at scale?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Geofencing: trigger an event when a moving entity (driver, user) enters or exits a defined polygon (delivery zone, restricted area). At scale (1M geofences, 100K moving entities): (1) Spatial index all geofences with R-tree or PostGIS GIST index. Index geofence bounding boxes first for coarse filtering, then do precise polygon containment check only on candidates. (2) For real-time entity movement: each entity reports position every 5 seconds. Compute which geofences contain the new position. Compare against which geofences contained the previous position. Difference = enter/exit events. (3) At 100K entities * 5-second updates = 20K position updates/second. Use a Redis geospatial set (GEOADD) for fast neighbor lookup, batch geofence checks, and publish enter/exit events to Kafka for downstream processing (notifications, billing, analytics).” }
    }
    ]
    }

    Scroll to Top