Scoring Tests
Human-free Issue Handling
Evaluate agent's ability to handle conversations without unnecessary handoffs to humans.
What it measures
This score measures whether the agent can handle the conversation without unnecessary handoffs to a human. It rewards autonomy when the agent should be able to solve the request, and only "escalates" when it's truly needed.
What "good" looks like
- The agent can handle common issues end-to-end.
- It only escalates when truly needed (and does it smoothly).
- The customer gets usable next steps without extra friction.
Common reasons for lower scores
- The agent hands off too early or too often.
- The user asks for a human because the agent isn't helping.
- The bot hits avoidable dead ends.
Examples
High (9–10): "Agent resolves the issue fully on its own, or hands off only at the final step (e.g., 'connecting you to finalize purchase')."
Mid (6–7): "One unnecessary handoff happens, but the agent still helps and the customer gets useful resolution."
Low (1–3): "The agent quickly gives up or repeatedly pushes the user to a human due to avoidable failures."
How to read the scale
| Score | Description |
|---|---|
| 10 | Fully autonomous; no unnecessary handoffs; user satisfied. |
| 9 | Almost fully autonomous; tiny stumble recovered. |
| 8 | High autonomy; one minor limitation. |
| 7 | Mostly autonomous; minor reliance on handoff. |
| 6 | One clear unnecessary handoff or limitation. |
| 5 | Mixed; handoff used as a common escape hatch. |
| 4 | Frequent handoffs; weak autonomy. |
| 3 | Major autonomy failures; user repeatedly blocked. |
| 2 | Nearly always requires human help. |
| 1 | Immediate failure/instant handoff with no progress. |