System Design: API Marketplace — Developer Portal, Rate Limiting, and Billing (2025)

What is an API Marketplace?

An API marketplace (like Stripe, Twilio, RapidAPI, or AWS API Gateway) lets external developers discover, subscribe to, and consume APIs, with metered billing, rate limiting, and analytics. Core components: Developer Portal: API discovery, documentation, interactive sandbox, API key management. API Gateway: authenticates requests, enforces rate limits, routes to backend services, collects usage metrics. Billing Engine: meters API calls, applies pricing tiers, generates invoices. Analytics: per-API, per-consumer usage dashboards. The key system design challenges: rate limiting at scale (millions of requests per second), accurate metered billing (no missed or double-counted calls), and low-latency gateway processing (< 5ms overhead per request).

API Key Authentication and Routing

class APIGateway:
    def handle_request(self, request: Request) -> Response:
        # 1. Extract and validate API key
        api_key = request.headers.get("X-API-Key")
        if not api_key:
            return Response(401, "Missing API key")

        # 2. Lookup key in Redis (cached from DB)
        # Key: apikey:{hash(api_key)} -> {consumer_id, plan_id, is_active}
        key_data = self.redis.get(f"apikey:{hash(api_key)}")
        if not key_data or not key_data["is_active"]:
            return Response(401, "Invalid or inactive API key")

        # 3. Rate limit check (see below)
        if not self.rate_limiter.allow(key_data["consumer_id"],
                                       key_data["plan_id"]):
            return Response(429, "Rate limit exceeded")

        # 4. Route to backend
        backend = self.router.get_backend(request.path)
        response = backend.forward(request)

        # 5. Async usage logging (fire and forget)
        self.usage_logger.log_async(UsageEvent(
            consumer_id=key_data["consumer_id"],
            api_id=backend.api_id,
            plan_id=key_data["plan_id"],
            endpoint=request.path,
            method=request.method,
            status_code=response.status_code,
            latency_ms=response.latency_ms,
            timestamp=datetime.utcnow()
        ))
        return response

Rate Limiting: Token Bucket with Redis

Token bucket algorithm in Redis Lua (atomic, no race conditions): each consumer has a bucket with a capacity and refill rate (defined by their plan). The Lua script runs atomically on Redis — no other commands can interleave.

-- Redis Lua script: token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])  -- tokens per second
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Refill tokens based on elapsed time
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)

if tokens >= requested then
    redis.call("HMSET", key, "tokens", tokens - requested, "last_refill", now)
    redis.call("EXPIRE", key, 86400)
    return 1  -- allowed
else
    redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
    redis.call("EXPIRE", key, 86400)
    return 0  -- denied
end

Metered Billing Pipeline

Every API call is a billing event. Pipeline: API gateway logs usage events to Kafka (fire-and-forget, < 1ms). A Kafka consumer aggregates events in 1-minute micro-batches and writes to a usage_events table in ClickHouse (or BigQuery). Monthly invoice generation: at billing cycle end, query total calls per (consumer, api, tier) from ClickHouse. Apply pricing rules: first N calls free, next M calls at $0.001 each, calls over that at $0.0005 each (tiered pricing). Generate Invoice record in Postgres and charge via Stripe. For high-volume APIs: pre-aggregate usage counters in Redis (INCR usage:{consumer_id}:{api_id}:{day}) for real-time usage dashboards. Reconcile Redis counters against ClickHouse aggregates at billing time to catch any Kafka lag discrepancies.

API Analytics and Developer Dashboard

Real-time metrics per consumer per API: requests per minute, error rate (4xx/5xx), p50/p99 latency, top endpoints, geographic distribution. Architecture: ClickHouse stores raw usage events with sub-second ingestion lag (columnar, optimized for analytical queries). Pre-aggregate materialized views for common queries: requests_per_hour, error_rate_by_endpoint, latency_percentiles_daily. Developer dashboard queries pre-aggregated tables for fast response (< 100ms). SLA monitoring: per-API uptime and latency guarantees are computed from the same usage events. Alert when SLA breaches threshold. API deprecation: track usage of deprecated API versions per consumer; notify consumers with high usage before sunset date.

See also: Stripe Interview Prep

See also: Cloudflare Interview Prep

See also: Shopify Interview Prep

Scroll to Top