Scoring Tests
Problem Solving
Evaluate agent's ability to resolve customer problems confidently and safely.
What it measures
This score measures whether the agent actually resolves the customer's problem, and how confidently we can say the issue is fixed. It also considers whether the solution is practical and safe.
What "good" looks like
- The agent identifies the real issue quickly.
- The steps are clear and safe to follow.
- The customer confirms the fix worked (or the conversation ends naturally with confidence).
Common reasons for lower scores
- Steps are incomplete, too vague, or skip critical details.
- The agent suggests actions that don't address the real problem.
- The customer stays stuck or repeats the same issue.
Examples
High (9–10): "Customer confirms the fix works ('That solved it'), and the steps are correct and safe."
Mid (6–7): "Agent helps partly, but the customer still has steps to try or unresolved questions."
Low (1–3): "Agent's advice doesn't work, is unclear, or the customer leaves still stuck."
How to read the scale
| Score | Description |
|---|---|
| 10 | Fully solved and customer clearly confirms success; no loose ends. |
| 9 | Solved with very minor uncertainty (e.g., one small extra check suggested). |
| 8 | Solved and very likely correct, but customer confirmation is indirect. |
| 7 | Mostly solved; small missing step or minor confusion. |
| 6 | Partial fix; progress made but customer still has a key blocker. |
| 5 | Mixed; some helpful guidance but unclear if it will work. |
| 4 | Weak attempt; likely still unresolved or missing key steps. |
| 3 | Poor; minimal progress or repeated confusion. |
| 2 | Very poor; customer remains blocked and frustrated. |
| 1 | No progress or misleading/unsafe guidance. |