Agent BrainsAgent Brains
Scoring Tests

On Task

Evaluate agent's ability to stay focused on the customer's topic and give clear, specific answers.

What it measures

This score measures whether the agent stays focused on the customer's topic and gives clear, specific answers (instead of generic "fluff" or drifting off-topic). It rewards relevance and specificity.

What "good" looks like

  • Directly answers the question asked.
  • Avoids long generic text.
  • Keeps the conversation moving step-by-step.

Common reasons for lower scores

  • Generic "I'm here to help" responses without substance.
  • Off-topic explanations or irrelevant questions.
  • Repeated clarification loops without progress.

Examples

High (9–10): "Customer asks how to register a product; agent gives the exact steps and link, without unnecessary filler."

Mid (6–7): "Agent is mostly helpful but sometimes rambles or asks too many unrelated questions."

Low (1–3): "Agent repeatedly gives generic replies ('I'm here to help!') without actionable details."

How to read the scale

ScoreDescription
10Always focused and specific; every response moves things forward.
9Nearly perfect focus; tiny bit of extra filler.
8Strong focus; minor drift quickly corrected.
7Good; a few vague moments but still helpful.
6Some drift/vagueness slows progress.
5Mixed; several responses feel generic.
4Frequently vague/off-topic.
3Mostly generic; user has to push for specifics.
2Almost entirely fluff or irrelevant.
1Off-task to the point of being unusable.

On this page