System Design Interview: Design a Pastebin / Code Snippet Service

System Design Interview: Design a Pastebin / Code Snippet Service

Designing Pastebin is a classic beginner-to-intermediate system design question that covers URL shortening, content storage, access control, and expiration. Also relevant for designing code snippet sharing in IDEs (GitHub Gist, Carbon).

Requirements Clarification

Functional Requirements

  • Create a paste (text/code) and get a short shareable URL
  • View a paste by its short URL
  • Optional: set expiration time (1 hour, 1 day, 1 week, never)
  • Optional: set visibility (public, unlisted, private)
  • Optional: syntax highlighting by language
  • User accounts for managing own pastes

Non-Functional Requirements

  • Scale: 10M pastes/day, 100M reads/day (10:1 read:write ratio)
  • Storage: pastes up to 10MB, average 10KB, retain for up to 10 years
  • Read latency: <100ms
  • Availability: 99.9%

Short URL Generation

Generate a unique 6-8 character alphanumeric ID for each paste.

Option 1: Random ID

import secrets, string

def generate_id(length=7):
    chars = string.ascii_letters + string.digits  # 62 chars
    return ''.join(secrets.choice(chars) for _ in range(length))
# 62^7 = 3.5 trillion unique IDs - sufficient
# Check DB for collision before inserting (rare but possible)

Option 2: Hash-Based

import hashlib

def generate_id(content, user_id):
    data = content + str(user_id) + str(time.time())
    hash_val = hashlib.md5(data.encode()).hexdigest()
    return hash_val[:7]  # first 7 chars of MD5
# Birthday problem: collision probability rises with volume
# At 10M/day with 62^7 IDs: negligible collision rate

Option 3: Distributed ID Generator

Pre-generate batches of unique IDs using a counter service (like Twitter Snowflake). No collision risk. More complex but necessary at very high scale.

High-Level Architecture

User
  |
Load Balancer
  |
API Service
  |         |
Create    Read
  |         |
ID Gen    Cache (Redis)
  |         | miss
Paste DB  Paste DB
(Postgres) (read replica)
  |
Object Store (S3)
(for large pastes >1KB)

Storage Design

Database Schema

pastes:
  id          VARCHAR(8) PRIMARY KEY
  user_id     INT (nullable for anonymous)
  title       VARCHAR(255)
  language    VARCHAR(50)    -- syntax highlighting
  visibility  ENUM('public', 'unlisted', 'private')
  size_bytes  INT
  content_key VARCHAR(255)   -- S3 key if stored externally
  content     TEXT           -- inline if <1KB
  created_at  TIMESTAMP
  expires_at  TIMESTAMP (nullable)

Hybrid Storage

  • Small pastes (<1KB): store inline in DB TEXT column for fast retrieval
  • Large pastes (>1KB): store in S3, keep content_key in DB
  • CDN in front of S3 for public pastes (cache aggressively with long TTL)

Caching Strategy

# Cache paste by ID in Redis
# Write-through: cache on create, read from cache first
def get_paste(paste_id):
    cached = redis.get(f"paste:{paste_id}")
    if cached: return json.loads(cached)

    paste = db.query("SELECT * FROM pastes WHERE id = ?", paste_id)
    if not paste: return None

    if paste.expires_at and paste.expires_at < now():
        return None  # expired

    ttl = min(3600, (paste.expires_at - now()).seconds) if paste.expires_at else 3600
    redis.setex(f"paste:{paste_id}", ttl, json.dumps(paste))
    return paste

Expiration Handling

  • Redis TTL: set TTL on cache entry equal to paste expiry. Cache auto-expires.
  • DB expiry: check expires_at on every read (lazy expiration). Background cron job deletes expired rows weekly to reclaim storage.
  • Soft delete: mark expired pastes instead of deleting, for potential undelete feature.

Access Control

  • Public: anyone can view, appears in search/explore
  • Unlisted: viewable by anyone with the link, not in search (like YouTube unlisted)
  • Private: only creator can view (requires auth check)

For private pastes, validate user session on every read request. Do not cache private pastes in shared cache without user isolation.

Analytics (Optional)

  • View count per paste: Redis INCR, batch write to DB hourly
  • Popular pastes: sorted set by view count, ZREVRANGE for top-K
  • Referrer tracking: log referrer header, aggregate by Kafka consumer

Interview Tips

  • This is a simpler URL shortener variant – design it cleanly in 20 minutes
  • Discuss storage choice: inline for small pastes, S3 for large
  • Expiration: lazy check on read + background cleanup
  • Caching with Redis is essential given 10:1 read:write ratio
  • Mention CDN for public pastes to reduce DB load
  • Discuss access control: public vs unlisted vs private


Companies that ask this: Cloudflare Interview Guide 2026: Networking, Edge Computing, and CDN Design

Companies that ask this: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Companies that ask this: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

Companies that ask this: Shopify Interview Guide

Companies that ask this: Snap Interview Guide

Companies that ask this: Atlassian Interview Guide

Scroll to Top