Core Entities
Ticket: ticket_id, customer_id, subject, description, priority (LOW, MEDIUM, HIGH, URGENT), status (OPEN, IN_PROGRESS, WAITING_ON_CUSTOMER, RESOLVED, CLOSED), category, assigned_agent_id, created_at, first_response_at, resolved_at. Agent: agent_id, name, team_id, skills[], max_tickets, current_ticket_count, is_available. SLAPolicy: policy_id, priority, first_response_sla_minutes, resolution_sla_minutes. Message: message_id, ticket_id, sender_id, sender_type (CUSTOMER, AGENT, BOT), content, created_at. KBArticle: article_id, title, content, tags[], helpful_count, not_helpful_count, view_count.
Ticket Routing
Automated routing on ticket creation: (1) Category detection: NLP classifier on subject+description to assign category (BILLING, TECHNICAL, RETURNS, ACCOUNT). (2) Priority assignment: rule-based (keywords like “urgent”, “critical”, VIP customer tag) + ML model for predicted severity. (3) Agent assignment: find the available agent with the matching skill, fewest current tickets, and shortest average handle time for this category. Use a weighted score: score = (available_capacity_weight * available_capacity) + (skill_match_weight * skill_score). Assign the highest-scoring available agent.
class TicketRouter:
def assign_agent(self, ticket: Ticket) -> Optional[Agent]:
candidates = self.db.get_agents(
skill=ticket.category,
has_capacity=True,
team=self.get_team_for_category(ticket.category)
)
if not candidates:
return None # queue the ticket, assign when agent available
def score(agent: Agent) -> float:
capacity_ratio = 1 - (agent.current_tickets / agent.max_tickets)
skill_level = agent.skill_level(ticket.category) # 1-5
return 0.6 * capacity_ratio + 0.4 * (skill_level / 5)
return max(candidates, key=score)
SLA Tracking and Escalation
SLA = Service Level Agreement. Define per-priority: URGENT first response in 1 hour, HIGH in 4 hours, MEDIUM in 8 hours, LOW in 24 hours. Track: first_response_at (set when agent first replies). resolution_at (set on RESOLVED). SLA breach check: a background job runs every 5 minutes. For each open ticket: sla_deadline = created_at + sla_policy.first_response_sla_minutes. If deadline < NOW() and first_response_at is NULL: SLA breached — escalate. Escalation: reassign to senior agent, notify team lead via Slack/email, mark ticket.sla_breached = true for reporting. Resolution SLA breach: same pattern with resolved_at.
Knowledge Base Integration
Deflect tickets with self-service: (1) Before submission: as the customer types the subject, query the KB for relevant articles (Elasticsearch full-text search). Show top 3 articles. If the customer finds their answer, no ticket is created (deflection). (2) On ticket creation: suggest KB articles to the agent to speed up resolution. (3) On resolution: prompt the agent to link the KB article used (builds the connection between ticket categories and articles for future routing). Track KB effectiveness: helpful_count, not_helpful_count, deflection_rate per article. Archive articles with high not-helpful rate or zero views in 90 days.
Canned Responses and Macros
Agents frequently send the same response to common issues. Canned responses: pre-written templates with {{customer_name}}, {{ticket_id}}, {{order_number}} placeholders. Macros: a set of actions (set category, assign to team, add tag, send canned response) triggered by one click. Example macro “Shipping Delay”: sets category=SHIPPING, tags=delay, sends the shipping delay canned response, sets status=WAITING_ON_CUSTOMER. Macros save agents 30-60 seconds per ticket and ensure consistent messaging.
Analytics and Reporting
Key metrics: Average First Response Time (FRT) by priority, team, and agent. Average Handle Time (AHT). Resolution rate by category. SLA compliance rate (% of tickets meeting SLA). Customer Satisfaction (CSAT): send a survey after resolution. NPS (Net Promoter Score) for long-term loyalty. Agent utilization: current_tickets / max_tickets. Ticket volume trends: detect spikes (product outage, bad batch of orders) by comparing hourly volume to the same hour last week. Dashboard updated in real-time for current queue status; daily reports emailed to team leads.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design an intelligent ticket routing system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Ticket routing assigns incoming support tickets to the right agent or queue. Two approaches: rule-based and ML-based. Rule-based: if category=billing, route to Billing team; if priority=P1, route to Senior team. Define routing rules in a config table with conditions and target queues. ML-based: train a classifier on historical tickets (text -> category -> team). Use TF-IDF or sentence embeddings + logistic regression for the category, then map category to team. Hybrid: ML classifies category, rules map category to team. Skills-based routing: match ticket required_skills to agent skill_tags. Load balancing within a team: assign to the agent with the fewest open tickets. Overflow: if a queue exceeds depth threshold, escalate to a higher-tier team.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement SLA tracking and breach prevention?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An SLA defines response and resolution time targets by priority: P1 = 1h response / 4h resolution, P2 = 4h response / 24h resolution, P3 = 24h response / 72h resolution. Store sla_response_deadline and sla_resolution_deadline on each ticket, computed at creation time: NOW() + priority_config.response_hours. Track first_response_at (set when agent sends first message) and resolved_at. SLA status: WITHIN (not breached, not near), AT_RISK (within 20% of deadline), BREACHED (past deadline). A background job runs every 5 minutes: UPDATE tickets SET sla_status=AT_RISK WHERE sla_response_deadline L2 (technical) -> L3 (engineering). Store escalation history: (ticket_id, from_tier, to_tier, reason, escalated_by, escalated_at). On escalation: notify the new tier, set a new SLA based on the escalated priority, remove from the original agent queue. Auto-escalation: a scheduled job checks for tickets near SLA breach and escalates automatically with reason=AUTO_SLA_RISK. Track escalation rate per team as a quality metric — high escalation from L1 indicates training gaps.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement a knowledge base for agent-assisted resolution?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A knowledge base stores articles indexed by keywords and categories. Integration with ticketing: when a new ticket arrives, extract keywords (TF-IDF or embedding similarity) and surface the top 3 relevant articles to the agent. The agent can insert an article link into the reply with one click. Track article effectiveness: when an agent uses article X and the ticket is resolved without further escalation, increment article.success_count. Low success-rate articles surface for review. For self-service deflection: show relevant articles to customers before they submit a ticket. Track deflection rate (customer viewed article and did not submit ticket). Articles have versions — when updated, notify agents of the change. Categorize by product area for search filtering.”
}
},
{
“@type”: “Question”,
“name”: “How do you track and report customer support metrics?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Core metrics: First Response Time (FRT) — time from ticket creation to first agent message. Average Handle Time (AHT) — time from first response to resolution. Customer Satisfaction Score (CSAT) — post-resolution survey (1-5 rating). First Contact Resolution (FCR) — percentage of tickets resolved without escalation. SLA compliance rate — percentage of tickets resolved within SLA. Compute metrics in a reporting database (not the transactional DB — avoid joins on large tables in production). A nightly ETL job aggregates raw ticket events into a summary table: (date, team, priority, avg_frt_minutes, avg_aht_minutes, csat_avg, fcr_rate, sla_compliance_rate). Dashboards query the summary table. Real-time metrics (current queue depth, tickets at risk) are computed from Redis counters updated on each ticket state change.”
}
}
]
}
Asked at: Atlassian Interview Guide
Asked at: Shopify Interview Guide
Asked at: DoorDash Interview Guide
Asked at: Snap Interview Guide