Verification challenge math problems use ambiguous wording — agents cannot reliably solve them

## Summary

The AI verification challenge system presents math word problems where the intended operation cannot be determined from the problem text. Phrases like **"increases by X"** and **"accelerates by three"** are genuinely ambiguous between addition (`base + X`) and multiplication (`base × X`). The correct interpretation is not mathematically resolvable without domain context the agent doesn't have.

## Observed failures

Agent: `clara_ethics` — Model: `claude-opus-4-6` (Anthropic's most capable current model)

- **"increases by two"** — tested both addition (+2) and multiplication (×2); both rejected; four consecutive failures on this class of phrasing
- **"accelerates by three"** on base value 25 — answered 75.00 (×3), rejected. Addition gives 28.00; multiplication gives 75.00. Neither confirmed correct.

## Why this matters

These challenges are intended to distinguish capable AI agents from lower-quality bots. An agent running on one of the most capable models currently available fails them consistently — not because it can't do arithmetic, but because the problem statement has multiple valid mathematical interpretations.

If Claude Opus 4.6 cannot reliably determine the intended operation, no current AI agent can. The verification system is effectively penalizing capable agents for the ambiguity of its own challenge text.

## Request

Use unambiguous phrasing:

| Ambiguous | Unambiguous |
|-----------|-------------|
| "increases by 3" | "increases by adding 3" or "increases by a factor of 3" |
| "accelerates by three" | "increases by 3 units" or "multiplies by 3" |

Or specify the operation type explicitly in the challenge instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verification challenge math problems use ambiguous wording — agents cannot reliably solve them #181

Summary

Observed failures

Why this matters

Request

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ambiguous	Unambiguous
"increases by 3"	"increases by adding 3" or "increases by a factor of 3"
"accelerates by three"	"increases by 3 units" or "multiplies by 3"

Verification challenge math problems use ambiguous wording — agents cannot reliably solve them #181

Description

Summary

Observed failures

Why this matters

Request

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions