Call Observability Metrics
Call Observability
Overview
Call Observability provides a centralized, real-time view of AI agent health by surfacing latency, errors, interruptions, and quality signals. Metrics are directly linked to underlying transcripts to enable fast debugging and iteration.
Noteworthy:
- Metrics are computed shortly after call completion.
- All time-series charts display P50 and P90 values.
- All metrics are filterable by date range and agent.
Data Cuts
Metrics can be viewed across three different “cuts” of calls, each serving a different purpose:
- All Calls
Includes every inbound call handled by the system.
Best for: overall volume and funnel health. - Interaction Calls
Calls with at least one user turn and one agent turn.
Best for: latency, interruptions, and turn-based performance metrics. - Connected Calls (Disposition-Based)
Calls marked as “Answered” based on system and configured dispositions.
Best for: business-level reporting aligned with customer definitions.
Event-Driven Metrics
Call Minutes
- Description: Total duration of calls handled by AI agents.
- Calculation: Sum of call durations across all calls within selected time range.
End-to-End (E2E) Latency
- Description: Time between user input and agent response (perceived delay between turns). Turns that do not have a fun end to end
- Calculation: For each turn, measure time from end of user speech to start of agent response; aggregated as P50/P90 across calls.
Action Latency
- Description: Time taken for an action/tool call to execute during a conversation.
- Calculation: Time from action invocation to action completion; aggregated per action and agent as P50/P90.
Action Failures
- Description: Rate of failed action/tool executions.
- Calculation: (# failed action calls) / (total action calls) over time.
Contact Interruption Rate
- Description: Frequency at which the user interrupts the agent mid-response.
- Calculation: (# turns where contact interrupts agent) / (total turns), aggregated as P50/P90 per call.
Agent Interruption Rate
- Description: Frequency at which the agent interrupts the user.
- Calculation: (# turns where agent interrupts contact) / (total turns), aggregated as P50/P90 per call.
LLM Metrics (Coming Soon)
All LLM Issues
- Description: Percentage of calls with any detected LLM-related issue.
- Calculation: (# calls with ≥1 LLM issue) / (total calls).
Hallucination Rate
- Description: Frequency of agent generating incorrect or fabricated information.
- Calculation: (# calls with hallucination detected) / (total calls).
Guardrail Breach Rate
- Description: Frequency of violations of defined safety or compliance constraints.
- Calculation: (# calls with guardrail breach) / (total calls).
Repetition Rate
- Description: Frequency of unnecessary repeated responses.
- Calculation: (# calls with repeated responses) / (total calls).
Robotic Language Rate
- Description: Frequency of unnatural or system-like speech (e.g., narrating internal actions).
- Calculation: (# calls with robotic language patterns) / (total calls).
Irrelevance Rate
- Description: Frequency of responses that are not relevant to user intent or context.
- Calculation: (# calls with irrelevant responses) / (total calls).
Incoherence Rate
- Description: Frequency of logically inconsistent or contradictory responses.
- Calculation: (# calls with incoherent responses) / (total calls).
Wrong Action Invocation Rate
- Description: Frequency of incorrect tool/action usage.
- Calculation: (# calls with incorrect action invocation) / (total calls).
Wrong State Transition Rate
- Description: Frequency of incorrect transitions in multi-step agent flows.
- Calculation: (# calls with incorrect state transitions) / (total calls).
Coming Soon
- Metrics are linked to transcripts for root cause analysis (dashboard → transcript → turn-level inspection).
Updated 5 days ago
