What Is an Online Judge?
An online judge (LeetCode, HackerRank, Codeforces) accepts code submissions in multiple languages, executes them against test cases, and returns results: Accepted, Wrong Answer, Time Limit Exceeded, Memory Limit Exceeded, Runtime Error, or Compilation Error. The core challenges: safe execution of untrusted code (sandboxing), scalability under submission spikes, and accurate verdict computation.
Code Execution and Sandboxing
Untrusted user code can: fork bomb the system, read sensitive files, make network calls, or consume unbounded memory. Isolation layers: (1) Container isolation: run each submission in a Docker container with no network access, read-only filesystem, and resource limits (CPU: 1 core, memory: 256MB). (2) Seccomp (Secure Computing Mode): whitelist only the system calls needed for computation (read, write, exit) — block fork, exec, socket, open. (3) Namespace isolation: separate PID, network, and mount namespaces. (4) Time limit enforcement: use SIGALRM or cgroups CPU quota to kill the process after the time limit (e.g., 2 seconds). (5) Memory limit: cgroups memory.limit_in_bytes kills the process on OOM. Multiple layers provide defense in depth — even if the container is compromised, seccomp prevents dangerous syscalls.
Execution Pipeline
Submission flow: user submits code -> API server validates (language supported, code length enqueue to a message queue (Kafka or SQS) per language -> judge worker picks up the job -> worker spins up a Docker container -> compiles (if compiled language) -> runs against each test case -> collects results -> sends verdict back via a result queue -> API server stores result and updates the user’s submission history.
Test cases: each problem has N test cases (typically 50-200). Run all test cases and return the first failure (or Accepted if all pass). For efficiency: run a few lightweight test cases first (fast feedback). Run heavy test cases last. Test cases are stored in object storage (S3) — workers download them at job start. Cache popular problem test cases in worker local storage.
Language Support
Interpreted languages (Python, JavaScript): compile step is skipped. Execute directly. Compiled languages (C++, Java, Go): compile first, report Compilation Error if it fails, then run the binary. Per-language containers: each language has a dedicated base Docker image with the compiler/runtime pre-installed (warm start). Container pooling: pre-warm N containers per language to avoid cold start overhead on each submission. Return containers to the pool after execution (reset the filesystem). Language-specific time limit adjustments: Python is 3x slower than C++ for the same algorithm — set per-language time limits (C++ 1s, Python 3s).
Scalability
Contest mode: thousands of simultaneous submissions (start of a contest). Scale judge workers horizontally: auto-scale the worker pool based on queue depth. Separate queues per language — prevents a Python submission spike from delaying C++ submissions. Priority queue: submissions for paid users or during contests get higher priority. Judge worker isolation: each worker can only run one submission at a time (CPU-bound) — over-scheduling degrades performance for all. Typical sizing: 1 worker core = 10 submissions/minute. For 1000 submissions/minute: 100 worker cores minimum. Use spot instances for judge workers (70% cheaper, acceptable eviction rate with job re-enqueue).
Result Delivery
Async results: submissions are processed asynchronously. The frontend polls or uses WebSocket to receive the verdict when ready. Client-side: show “Judging…” with progress updates. Server push: when the verdict is ready, push via WebSocket to the client’s browser. Store all submissions in a database: (submission_id, user_id, problem_id, language, code, verdict, runtime_ms, memory_mb, submitted_at). User can view their submission history and replay any submission.
Interview Tips
- Sandboxing is the core design challenge. Mention at least two isolation layers (Docker + seccomp) — single-layer isolation is insufficient for truly untrusted code.
- The job queue + worker pool pattern is standard. Emphasize that workers are stateless (any worker can handle any submission) — this enables horizontal scaling.
- Test case management: test cases are the intellectual property of the platform. Store encrypted in S3; workers decrypt locally. Do not transmit test case outputs to clients (prevents reverse-engineering).
Asked at: Meta Interview Guide
Asked at: Coinbase Interview Guide
Asked at: Databricks Interview Guide
Asked at: Cloudflare Interview Guide