System Design Interview: Design a Pastebin / Code Snippet Service
Designing Pastebin is a classic beginner-to-intermediate system design question that covers URL shortening, content storage, access control, and expiration. Also relevant for designing code snippet sharing in IDEs (GitHub Gist, Carbon).
Requirements Clarification
Functional Requirements
- Create a paste (text/code) and get a short shareable URL
- View a paste by its short URL
- Optional: set expiration time (1 hour, 1 day, 1 week, never)
- Optional: set visibility (public, unlisted, private)
- Optional: syntax highlighting by language
- User accounts for managing own pastes
Non-Functional Requirements
- Scale: 10M pastes/day, 100M reads/day (10:1 read:write ratio)
- Storage: pastes up to 10MB, average 10KB, retain for up to 10 years
- Read latency: <100ms
- Availability: 99.9%
Short URL Generation
Generate a unique 6-8 character alphanumeric ID for each paste.
Option 1: Random ID
import secrets, string
def generate_id(length=7):
chars = string.ascii_letters + string.digits # 62 chars
return ''.join(secrets.choice(chars) for _ in range(length))
# 62^7 = 3.5 trillion unique IDs - sufficient
# Check DB for collision before inserting (rare but possible)
Option 2: Hash-Based
import hashlib
def generate_id(content, user_id):
data = content + str(user_id) + str(time.time())
hash_val = hashlib.md5(data.encode()).hexdigest()
return hash_val[:7] # first 7 chars of MD5
# Birthday problem: collision probability rises with volume
# At 10M/day with 62^7 IDs: negligible collision rate
Option 3: Distributed ID Generator
Pre-generate batches of unique IDs using a counter service (like Twitter Snowflake). No collision risk. More complex but necessary at very high scale.
High-Level Architecture
User
|
Load Balancer
|
API Service
| |
Create Read
| |
ID Gen Cache (Redis)
| | miss
Paste DB Paste DB
(Postgres) (read replica)
|
Object Store (S3)
(for large pastes >1KB)
Storage Design
Database Schema
pastes:
id VARCHAR(8) PRIMARY KEY
user_id INT (nullable for anonymous)
title VARCHAR(255)
language VARCHAR(50) -- syntax highlighting
visibility ENUM('public', 'unlisted', 'private')
size_bytes INT
content_key VARCHAR(255) -- S3 key if stored externally
content TEXT -- inline if <1KB
created_at TIMESTAMP
expires_at TIMESTAMP (nullable)
Hybrid Storage
- Small pastes (<1KB): store inline in DB TEXT column for fast retrieval
- Large pastes (>1KB): store in S3, keep content_key in DB
- CDN in front of S3 for public pastes (cache aggressively with long TTL)
Caching Strategy
# Cache paste by ID in Redis
# Write-through: cache on create, read from cache first
def get_paste(paste_id):
cached = redis.get(f"paste:{paste_id}")
if cached: return json.loads(cached)
paste = db.query("SELECT * FROM pastes WHERE id = ?", paste_id)
if not paste: return None
if paste.expires_at and paste.expires_at < now():
return None # expired
ttl = min(3600, (paste.expires_at - now()).seconds) if paste.expires_at else 3600
redis.setex(f"paste:{paste_id}", ttl, json.dumps(paste))
return paste
Expiration Handling
- Redis TTL: set TTL on cache entry equal to paste expiry. Cache auto-expires.
- DB expiry: check expires_at on every read (lazy expiration). Background cron job deletes expired rows weekly to reclaim storage.
- Soft delete: mark expired pastes instead of deleting, for potential undelete feature.
Access Control
- Public: anyone can view, appears in search/explore
- Unlisted: viewable by anyone with the link, not in search (like YouTube unlisted)
- Private: only creator can view (requires auth check)
For private pastes, validate user session on every read request. Do not cache private pastes in shared cache without user isolation.
Analytics (Optional)
- View count per paste: Redis INCR, batch write to DB hourly
- Popular pastes: sorted set by view count, ZREVRANGE for top-K
- Referrer tracking: log referrer header, aggregate by Kafka consumer
Interview Tips
- This is a simpler URL shortener variant – design it cleanly in 20 minutes
- Discuss storage choice: inline for small pastes, S3 for large
- Expiration: lazy check on read + background cleanup
- Caching with Redis is essential given 10:1 read:write ratio
- Mention CDN for public pastes to reduce DB load
- Discuss access control: public vs unlisted vs private
Companies that ask this: Cloudflare Interview Guide 2026: Networking, Edge Computing, and CDN Design
Companies that ask this: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale
Companies that ask this: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale
Companies that ask this: Shopify Interview Guide
Companies that ask this: Snap Interview Guide
Companies that ask this: Atlassian Interview Guide