Agent BrainsAgent Brains
Scoring Tests

Problem Solving

Evaluate agent's ability to resolve customer problems confidently and safely.

What it measures

This score measures whether the agent actually resolves the customer's problem, and how confidently we can say the issue is fixed. It also considers whether the solution is practical and safe.

What "good" looks like

  • The agent identifies the real issue quickly.
  • The steps are clear and safe to follow.
  • The customer confirms the fix worked (or the conversation ends naturally with confidence).

Common reasons for lower scores

  • Steps are incomplete, too vague, or skip critical details.
  • The agent suggests actions that don't address the real problem.
  • The customer stays stuck or repeats the same issue.

Examples

High (9–10): "Customer confirms the fix works ('That solved it'), and the steps are correct and safe."

Mid (6–7): "Agent helps partly, but the customer still has steps to try or unresolved questions."

Low (1–3): "Agent's advice doesn't work, is unclear, or the customer leaves still stuck."

How to read the scale

ScoreDescription
10Fully solved and customer clearly confirms success; no loose ends.
9Solved with very minor uncertainty (e.g., one small extra check suggested).
8Solved and very likely correct, but customer confirmation is indirect.
7Mostly solved; small missing step or minor confusion.
6Partial fix; progress made but customer still has a key blocker.
5Mixed; some helpful guidance but unclear if it will work.
4Weak attempt; likely still unresolved or missing key steps.
3Poor; minimal progress or repeated confusion.
2Very poor; customer remains blocked and frustrated.
1No progress or misleading/unsafe guidance.

On this page