System Design Interview: Hotel and Airbnb Booking System

Designing a hotel or home-sharing booking system like Airbnb is a comprehensive system design problem that combines availability calendars, concurrent booking prevention, search with geospatial queries, pricing engines, and payment workflows.

Core Requirements

Functional: Hosts list properties with pricing and availability. Guests search by location/dates/guests. Book a property (with payment). Cancel bookings (with refund policy). Messaging between hosts and guests. Reviews after stay.

Non-functional: 100M properties, 500M users, 1M bookings/day. Search < 500ms p99. No double-booking. Pricing updates in real-time. 99.99% uptime for booking flow.

Availability and Double-Booking Prevention

Core problem: two guests book the same property on overlapping dates

Naive approach (WRONG — race condition):
  1. Guest A: check if Oct 1-5 available → YES
  2. Guest B: check if Oct 3-7 available → YES
  3. Guest A: create booking Oct 1-5 → SUCCESS
  4. Guest B: create booking Oct 3-7 → SUCCESS (double-booked!)

Solution: optimistic locking with version counter

properties table:
  id           UUID PRIMARY KEY
  host_id      UUID
  name         TEXT
  ...

property_availability table:
  property_id  UUID
  date         DATE
  status       ENUM(available, booked, blocked)
  booking_id   UUID NULL
  version      INT DEFAULT 0
  PRIMARY KEY (property_id, date)

Booking flow (atomic):
  BEGIN;

  -- Lock all dates in range (sorted to prevent deadlock)
  SELECT * FROM property_availability
  WHERE property_id = ? AND date BETWEEN ? AND ?
  FOR UPDATE;  -- row-level locks on each date

  -- Verify all dates are available
  IF any row has status != 'available': ROLLBACK; return 409

  -- Create booking record
  INSERT INTO bookings (id, property_id, guest_id, check_in, check_out, status)
  VALUES (?, ?, ?, ?, ?, 'pending');

  -- Mark dates as booked
  UPDATE property_availability
  SET status='booked', booking_id=?
  WHERE property_id=? AND date BETWEEN ? AND ?;

  COMMIT;  -- releases all locks

Availability Calendar Design

Date-per-row model (recommended for flexibility):
  One row per (property, date)
  5 years × 365 days × 100M properties = 182B rows (use Cassandra or sharding)

Range model (compact, harder to query):
  blocked_dates: (property_id, start_date, end_date, status)
  Overlap check: start_date  requested_start
  Problem: updating a sub-range requires splitting existing ranges

Recommendation: Date-per-row model with Cassandra
  Partition key: property_id
  Clustering key: date
  Supports: fast range scans, atomic date updates, O(1) availability check per date

Redis cache for availability (hot properties):
  Key: avail:{property_id}:{year_month}
  Value: bitmap where bit N = availability of day N
  BITCOUNT to count available days in a month
  GETBIT to check a specific date
  TTL: 5 minutes (refresh on booking/cancellation)

Search Architecture

Search requirements: location + dates + guests + filters
  "Find properties in San Francisco for 2 guests, Oct 1-5, with pool"

Two-phase search:
  Phase 1: Geospatial candidate retrieval (fast)
    Elasticsearch geo_distance query:
      center: {lat: 37.7749, lon: -122.4194}
      radius: 50km
      filter: guests_capacity >= 2, amenities contains pool
    Returns: ~1000 candidate property_ids

  Phase 2: Availability filter (targeted)
    For each candidate property_id, check availability table:
      SELECT COUNT(*) FROM property_availability
      WHERE property_id IN (candidates)
        AND date BETWEEN Oct 1 AND Oct 5
        AND status = 'available'
    Keep only properties where COUNT = 5 (all nights available)

  Phase 3: Ranking (ML model)
    Score by: relevance, host rating, price, personalization
    Return top 50 results

Geospatial indexing:
  Elasticsearch: built-in geo_point field, geo_distance aggregations
  PostgreSQL: PostGIS extension, ST_DWithin for radius queries
  Alternative: H3 hexagonal cells for region-based partitioning

Pricing Engine

Dynamic pricing factors:
  Base price:    set by host
  Demand boost:  events nearby (concert +20%), holidays (+30%)
  Supply factor: few properties available in area (+15%)
  Lead time:     last-minute discounts (-10%)
  Length of stay: weekly discount (-5%), monthly discount (-20%)
  Seasonal:      summer peak in beach cities

Price calculation:
  nightly_price = base_price
                × demand_multiplier(location, date)
                × length_of_stay_discount(nights)
                × last_minute_factor(days_until_checkin)

  total_price = sum(nightly_prices) + cleaning_fee + service_fee

Service fee: Airbnb takes ~3% from host + ~13% from guest

Real-time price display:
  Price API: GET /properties/{id}/price?checkin=...&checkout=...
  → Computes per-night breakdown
  → Cached per (property, date_range) for 60 seconds
  → Invalidated when host changes base price

Booking State Machine

States:
  PENDING → CONFIRMED → ACTIVE → COMPLETED
      ↓
  CANCELLED (at any pre-active stage)

PENDING:    payment authorized (not captured), awaiting host confirmation
CONFIRMED:  host accepted, payment captured
ACTIVE:     guest has checked in
COMPLETED:  stay complete, reviews unlocked
CANCELLED:  cancelled by guest or host; refund per policy

Instant Book (no host confirmation needed):
  PENDING → CONFIRMED immediately
  Payment captured synchronously

Refund policy variants:
  Flexible:    full refund if cancelled 24h before check-in
  Moderate:    full refund if cancelled 5 days before
  Strict:      50% refund if cancelled 14 days before
  Non-refundable: no refund

Cancellation flow:
  1. Calculate refund amount per policy
  2. Stripe refund via refunds.create() (async, 5-10 days)
  3. Mark dates available again
  4. Notify host via email/push

Database Design

bookings table:
  id            UUID PRIMARY KEY
  property_id   UUID NOT NULL
  guest_id      UUID NOT NULL
  host_id       UUID NOT NULL  -- denormalized for fast host queries
  check_in      DATE NOT NULL
  check_out     DATE NOT NULL  -- exclusive end date
  guests_count  INT
  status        TEXT DEFAULT 'pending'
  total_cents   BIGINT
  currency      CHAR(3) DEFAULT 'USD'
  payment_intent_id TEXT  -- Stripe PaymentIntent
  created_at    TIMESTAMPTZ DEFAULT NOW()

  INDEX (guest_id, created_at DESC)
  INDEX (host_id, check_in DESC)
  INDEX (property_id, status, check_in)

properties table:
  id            UUID PRIMARY KEY
  host_id       UUID
  name          TEXT
  description   TEXT
  lat           DOUBLE PRECISION
  lng           DOUBLE PRECISION
  guests_max    INT
  bedrooms      INT
  bathrooms     NUMERIC(2,1)
  amenities     JSONB  -- {pool: true, wifi: true, ...}
  base_price_cents INT
  cleaning_fee_cents INT
  instant_book  BOOLEAN DEFAULT FALSE
  status        TEXT DEFAULT 'active'  -- active, suspended, deactivated

Interview Discussion Points

  • Why SELECT FOR UPDATE instead of optimistic locking? Date-level availability requires locking all dates in the requested range atomically. Optimistic locking (check-and-update with version) works well for single-row updates but becomes complex for multi-row ranges (need to check N versions simultaneously). SELECT FOR UPDATE with sorted date locks is simpler and prevents deadlocks when two bookings compete for overlapping date ranges.
  • How to handle Airbnb’s split payment model? Guests pay immediately (via Stripe Payment Intent authorize-only), hosts receive payment 24 hours after check-in. The platform holds the funds in escrow (Stripe Connect platform account). This protects both parties: if the host cancels last-minute, the guest gets a full refund; if the guest no-shows, the host gets paid per the cancellation policy.
  • Search scalability: Elasticsearch handles the geospatial + filter query. The availability check runs against Cassandra (fast range reads by property_id + date). At 100M properties with 365 days each, a single Cassandra cluster with 20 nodes handles the availability reads at < 5ms p99.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you prevent double-booking in a hotel reservation system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Double-booking is prevented by using database row-level locking on the availability records for each requested date. When a guest attempts to book, the system begins a database transaction and issues SELECT FOR UPDATE on all availability rows for the property and date range (sorted by date to prevent deadlock). This blocks any concurrent booking attempting to lock the same rows. After verifying all dates show status=’available’, the system inserts the booking record and updates the availability rows to status=’booked’ atomically within the transaction. If any date is already booked, the transaction is rolled back and the guest receives a 409 Conflict response. This guarantees exactly-once booking even under high concurrent load.”
}
},
{
“@type”: “Question”,
“name”: “How does the availability calendar work in an Airbnb-style system at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The availability calendar uses a date-per-row model: one database row per (property_id, date) storing the availability status. This enables fast range scans (check availability for a 7-night stay = 7 row reads), atomic date-level updates (lock only the dates being booked), and simple status transitions. For 100M properties u00d7 365 days u00d7 5 years = 182B rows, a wide-column database like Cassandra or DynamoDB is used with property_id as the partition key and date as the clustering key. A Redis bitmap cache (1 bit per day per property, TTL 5 minutes) provides instant availability checks without database reads for popular properties u2014 invalidated on every booking or cancellation.”
}
},
{
“@type”: “Question”,
“name”: “How does geospatial search work in a property booking platform?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Property search uses a two-phase approach. Phase 1 (candidate retrieval): an Elasticsearch cluster with geo_point indexed fields handles “find all properties within 50km of San Francisco with capacity >= 2 and pool amenity” u2014 this geospatial + filter query returns ~1,000 candidate property IDs in < 100ms. Phase 2 (availability filter): for each candidate property, the system checks the availability database to verify all requested dates are open. This targeted lookup (1,000 properties u00d7 7 nights = 7,000 row reads) runs in parallel and takes ~50ms. Results are then scored by a ranking model (price, rating, host acceptance rate, personalization) and the top 50 are returned to the client."
}
}
]
}

  • DoorDash Interview Guide
  • Lyft Interview Guide
  • Atlassian Interview Guide
  • Stripe Interview Guide
  • Uber Interview Guide
  • Companies That Ask This

    Scroll to Top