Question 1

How do you design an intelligent ticket routing system?

Accepted Answer

Ticket routing assigns incoming support tickets to the right agent or queue. Two approaches: rule-based and ML-based. Rule-based: if category=billing, route to Billing team; if priority=P1, route to Senior team. Define routing rules in a config table with conditions and target queues. ML-based: train a classifier on historical tickets (text -> category -> team). Use TF-IDF or sentence embeddings + logistic regression for the category, then map category to team. Hybrid: ML classifies category, rules map category to team. Skills-based routing: match ticket required_skills to agent skill_tags. Load balancing within a team: assign to the agent with the fewest open tickets. Overflow: if a queue exceeds depth threshold, escalate to a higher-tier team.

Question 2

How do you implement SLA tracking and breach prevention?

Accepted Answer

An SLA defines response and resolution time targets by priority: P1 = 1h response / 4h resolution, P2 = 4h response / 24h resolution, P3 = 24h response / 72h resolution. Store sla_response_deadline and sla_resolution_deadline on each ticket, computed at creation time: NOW() + priority_config.response_hours. Track first_response_at (set when agent sends first message) and resolved_at. SLA status: WITHIN (not breached, not near), AT_RISK (within 20% of deadline), BREACHED (past deadline). A background job runs every 5 minutes: UPDATE tickets SET sla_status=AT_RISK WHERE sla_response_deadline  L2 (technical) -> L3 (engineering). Store escalation history: (ticket_id, from_tier, to_tier, reason, escalated_by, escalated_at). On escalation: notify the new tier, set a new SLA based on the escalated priority, remove from the original agent queue. Auto-escalation: a scheduled job checks for tickets near SLA breach and escalates automatically with reason=AUTO_SLA_RISK. Track escalation rate per team as a quality metric -- high escalation from L1 indicates training gaps.

Question 3

How do you implement a knowledge base for agent-assisted resolution?

Accepted Answer

A knowledge base stores articles indexed by keywords and categories. Integration with ticketing: when a new ticket arrives, extract keywords (TF-IDF or embedding similarity) and surface the top 3 relevant articles to the agent. The agent can insert an article link into the reply with one click. Track article effectiveness: when an agent uses article X and the ticket is resolved without further escalation, increment article.success_count. Low success-rate articles surface for review. For self-service deflection: show relevant articles to customers before they submit a ticket. Track deflection rate (customer viewed article and did not submit ticket). Articles have versions -- when updated, notify agents of the change. Categorize by product area for search filtering.

Question 4

How do you track and report customer support metrics?

Accepted Answer

Core metrics: First Response Time (FRT) -- time from ticket creation to first agent message. Average Handle Time (AHT) -- time from first response to resolution. Customer Satisfaction Score (CSAT) -- post-resolution survey (1-5 rating). First Contact Resolution (FCR) -- percentage of tickets resolved without escalation. SLA compliance rate -- percentage of tickets resolved within SLA. Compute metrics in a reporting database (not the transactional DB -- avoid joins on large tables in production). A nightly ETL job aggregates raw ticket events into a summary table: (date, team, priority, avg_frt_minutes, avg_aht_minutes, csat_avg, fcr_rate, sla_compliance_rate). Dashboards query the summary table. Real-time metrics (current queue depth, tickets at risk) are computed from Redis counters updated on each ticket state change.

Low-Level Design: Customer Support Ticketing System — Routing, SLA, Escalation, and Knowledge Base

Core Entities

Ticket Routing

SLA Tracking and Escalation

Knowledge Base Integration

Canned Responses and Macros

Analytics and Reporting