Call Observability Metrics
Call Observability
Overview
Call Observability provides a centralized, real-time view of AI agent health by surfacing latency, errors, interruptions, and quality signals. Metrics are directly linked to underlying transcripts to enable fast debugging and iteration.
Noteworthy:
- Metrics are computed shortly after call completion.
- All time-series charts display P50 and P90 values.
- All metrics are filterable by date range and agent.
Data Cuts
Metrics can be viewed across three different “cuts” of calls, each serving a different purpose:
- All Calls
Includes every inbound call handled by the system.
Best for: overall volume and funnel health. - Interaction Calls
Calls with at least one user turn and one agent turn.
Best for: latency, interruptions, and turn-based performance metrics. - Connected Calls (Disposition-Based)
Calls marked as “Answered” based on system and configured dispositions.
Best for: business-level reporting aligned with customer definitions.
Event-Driven Metrics
Call Minutes
- Description: Total duration of calls handled by AI agents.
- Calculation: Sum of call durations across all calls within selected time range.
End-to-End (E2E) Latency
- Description: Time between customer input and agent response (the perceived delay between turns).
- Calculation: For each turn, measure time from end of user speech to start of agent response; aggregated as P50/P90 across calls.
These components do not always happen in a linear timeline, but together they represent what can drive overall E2E latency.
| Component | Definition | Levers to improve |
|---|---|---|
| First turn latency | Customer + Regal. Time from call connection to the agent's first spoken word. Includes answering machine detection (AMD), greeting generation via the LLM, and dialer overhead. | Use a static greeting to skip first-turn LLM generation. Tune AMD settings. Adjust dialer configuration. Reduce and restructure the system prompt. |
| Speech-to-text | DeepGram via Livekit. Time to transcribe the caller's speech into text. This is typically one of the smaller contributors to total E2E latency. | Limited direct control because this is provider-dependent. Regal evaluates STT providers for speed. |
| End of utterance | Customer-tunable. Time for the system to detect that the caller has finished speaking. This includes voice activity detection (VAD), DeepGram transcription delay, and Livekit end-of-turn detection. This delay happens before the LLM starts generating a response. | Tune the responsiveness setting in agent config. Shorter end-of-turn delay can reduce perceived latency, but it can clip callers who pause mid-sentence. This is often an overlooked source of higher E2E latency. |
| LLM latency | Largest component. Time to first token (TTFT): how long the LLM takes to generate the first token of its response. TTFT of 500 ms or less is generally sufficient for most voice AI use cases. This is usually the largest contributor to E2E latency and scales with prompt length. | Reduce and restructure the prompt. Pre-emptive generation can also help. Regal engineering is exploring self-hosted inference and caching options. |
| Text-to-speech | Customer-tunable. Time to first byte (TTFByte): how long the TTS provider takes to begin generating audio from the LLM's text output. This is generally a smaller contributor than LLM latency. | Switch the voice vendor on the agent. Test alternative voices within the same provider because latency can vary by voice model. Regal also evaluates TTS providers for speed and quality. |
| Network / other | Hard to control. Additional latency from network transit, audio decoding, microphone processing, and other system overhead. This is measured as the gap between total silence and the start or end of what is received from Livekit. | Minimal direct control. Most improvements come from infrastructure-level optimizations by Regal engineering. |
| Function invoke-to-response | Customer + Regal. Time from the start of a function or action invocation to function completion. This should be grouped by function type, such as transfer, schedule, or custom API call, to make comparisons useful. It may include intentional delays such as transfer bridge statements. | Improve response times for customer-owned API endpoints. Add bridge statements before slow actions to reduce perceived latency. For default actions such as schedule_callback and warm_transfer, Regal engineering owns fixes. |
| Multi-state transition | Under investigation. Time for the agent to transition between states in a multi-state flow. This is measured from LLM time-to-first-byte through the TTS provider. Non-latency metadata, such as the state transition itself, will be surfaced separately. | This is an active engineering workstream and is not yet customer-tunable. |
Action Latency
- Description: Time taken for an action/tool call to execute during a conversation. The time CAN include agent speech if Speak During is configured for the action.
- Calculation: Time from action invocation to action completion; aggregated per action and agent as P50/P90.
Action Failures
- Description: Rate of failed action/tool executions. Only true failures are reflected, i.e. if agent retried successfully it will not be reflected in the failed action %.
- Calculation: (# failed action calls) / (total action calls) over time.
Contact Interruption Rate
- Description: Frequency at which the user interrupts the agent mid-response.
- Calculation: (# turns where contact interrupts agent) / (total turns), aggregated as P50/P90 per call.
LLM Metrics (Coming Soon)
All LLM Issues
- Description: Percentage of calls with any detected LLM-related issue.
- Calculation: (# calls with ≥1 LLM issue) / (total calls).
Hallucination Rate
- Description: Frequency of agent generating incorrect or fabricated information.
- Calculation: (# calls with hallucination detected) / (total calls).
Guardrail Breach Rate
- Description: Frequency of violations of defined safety or compliance constraints.
- Calculation: (# calls with guardrail breach) / (total calls).
Repetition Rate
- Description: Frequency of unnecessary repeated responses.
- Calculation: (# calls with repeated responses) / (total calls).
Robotic Language Rate
- Description: Frequency of unnatural or system-like speech (e.g., narrating internal actions).
- Calculation: (# calls with robotic language patterns) / (total calls).
Irrelevance Rate
- Description: Frequency of responses that are not relevant to user intent or context.
- Calculation: (# calls with irrelevant responses) / (total calls).
Incoherence Rate
- Description: Frequency of logically inconsistent or contradictory responses.
- Calculation: (# calls with incoherent responses) / (total calls).
Wrong Action Invocation Rate
- Description: Frequency of incorrect tool/action usage.
- Calculation: (# calls with incorrect action invocation) / (total calls).
Wrong State Transition Rate
- Description: Frequency of incorrect transitions in multi-step agent flows.
- Calculation: (# calls with incorrect state transitions) / (total calls).
Coming Soon
- Deeper navigation from the dashboard to our application.
- Metrics are linked to transcripts for root cause analysis (dashboard → transcript → turn-level inspection).
Updated 14 days ago
