Call Observability Metrics

Call Observability

Overview

Call Observability provides a centralized, real-time view of AI agent health by surfacing latency, errors, interruptions, and quality signals. Metrics are directly linked to underlying transcripts to enable fast debugging and iteration.

Noteworthy:

  • Metrics are computed shortly after call completion.
  • All time-series charts display P50 and P90 values.
  • All metrics are filterable by date range and agent.

Data Cuts

Metrics can be viewed across three different “cuts” of calls, each serving a different purpose:

  • All Calls
    Includes every inbound call handled by the system.
    Best for: overall volume and funnel health.
  • Interaction Calls
    Calls with at least one user turn and one agent turn.
    Best for: latency, interruptions, and turn-based performance metrics.
  • Connected Calls (Disposition-Based)
    Calls marked as “Answered” based on system and configured dispositions.
    Best for: business-level reporting aligned with customer definitions.

Event-Driven Metrics

Call Minutes

  • Description: Total duration of calls handled by AI agents.
  • Calculation: Sum of call durations across all calls within selected time range.

End-to-End (E2E) Latency

  • Description: Time between user input and agent response (perceived delay between turns). Turns that do not have a fun end to end
  • Calculation: For each turn, measure time from end of user speech to start of agent response; aggregated as P50/P90 across calls.

Action Latency

  • Description: Time taken for an action/tool call to execute during a conversation.
  • Calculation: Time from action invocation to action completion; aggregated per action and agent as P50/P90.

Action Failures

  • Description: Rate of failed action/tool executions.
  • Calculation: (# failed action calls) / (total action calls) over time.

Contact Interruption Rate

  • Description: Frequency at which the user interrupts the agent mid-response.
  • Calculation: (# turns where contact interrupts agent) / (total turns), aggregated as P50/P90 per call.

Agent Interruption Rate

  • Description: Frequency at which the agent interrupts the user.
  • Calculation: (# turns where agent interrupts contact) / (total turns), aggregated as P50/P90 per call.

LLM Metrics (Coming Soon)

All LLM Issues

  • Description: Percentage of calls with any detected LLM-related issue.
  • Calculation: (# calls with ≥1 LLM issue) / (total calls).

Hallucination Rate

  • Description: Frequency of agent generating incorrect or fabricated information.
  • Calculation: (# calls with hallucination detected) / (total calls).

Guardrail Breach Rate

  • Description: Frequency of violations of defined safety or compliance constraints.
  • Calculation: (# calls with guardrail breach) / (total calls).

Repetition Rate

  • Description: Frequency of unnecessary repeated responses.
  • Calculation: (# calls with repeated responses) / (total calls).

Robotic Language Rate

  • Description: Frequency of unnatural or system-like speech (e.g., narrating internal actions).
  • Calculation: (# calls with robotic language patterns) / (total calls).

Irrelevance Rate

  • Description: Frequency of responses that are not relevant to user intent or context.
  • Calculation: (# calls with irrelevant responses) / (total calls).

Incoherence Rate

  • Description: Frequency of logically inconsistent or contradictory responses.
  • Calculation: (# calls with incoherent responses) / (total calls).

Wrong Action Invocation Rate

  • Description: Frequency of incorrect tool/action usage.
  • Calculation: (# calls with incorrect action invocation) / (total calls).

Wrong State Transition Rate

  • Description: Frequency of incorrect transitions in multi-step agent flows.
  • Calculation: (# calls with incorrect state transitions) / (total calls).

Coming Soon

  • Metrics are linked to transcripts for root cause analysis (dashboard → transcript → turn-level inspection).