YAML-based evaluation framework for testing Python functions with AI-powered test generation, function healing and TDD approach.
vowel makes it easy to define test cases in YAML and run them against your Python functions. It also provides AI-powered generators that can automatically create test specs, generate implementations, and fix buggy functions.
pip install vowel
# Or with uv
uv add vowelVowel supports several optional dependency groups for enhanced functionality:
| Group | Install Command | Purpose / Extras |
|---|---|---|
| all | pip install vowel[all] |
All optional features |
| dev | pip install vowel[dev] |
Development & testing tools |
| mcp | pip install vowel[mcp] |
MCP server |
| optimization | pip install vowel[optimize] |
Performance optimizations |
| monty | pip install vowel[monty] |
Monty runtime support |
| logfire | pip install vowel[logfire] |
Logfire integration |
Tip:
You can install multiple extras at once, e.g.
pip install vowel[dev,mcp]Recommended:pip install vowel[all]
git clone https://github.com/fswair/vowel.git
cd vowel
pip install -e ".[all]"Note:
For a deeper understanding of how vowel handles fixtures, see the examples indb_fixture.ymlanddb.py. These files demonstrate the underlying mechanics of fixture setup and usage.
Tip:
To enable YAML schema validation in your editor, placevowel-schema.jsonin your project directory.
Then, add the following directive at the top of your YAML file to activate schema support and instructions:# yaml-language-server: $schema=<path/to/vowel-schema.json>Replace
<path/to/vowel-schema.json>with the actual path to your schema file.
# evals.yml
add:
dataset:
- case:
inputs: { x: 2, y: 2 }
expected: 4
- case:
inputs: { x: -5, y: 5 }
expected: 0
divide:
evals:
Type:
type: "float"
dataset:
- case:
inputs: { a: 10, b: 2 }
expected: 5.0
- case:
inputs: { a: 1, b: 0 }
raises: ZeroDivisionErrorvowel evals.ymlfrom vowel import run_evals
def add(x: int, y: int) -> int:
return x + y
def divide(a: float, b: float) -> float:
return a / b
summary = run_evals("evals.yml", functions={"add": add, "divide": divide})
print(f"All passed: {summary.all_passed}")
print(f"Coverage: {summary.coverage * 100:.1f}%")from vowel import RunEvals
summary = (
RunEvals.from_file("evals.yml")
.with_functions({"add": add, "divide": divide})
.filter(["add"])
.debug()
.run()
)
summary.print()8 built-in evaluators for flexible testing:
| Evaluator | Purpose |
|---|---|
| Expected | Exact value matching |
| Type | Return type checking (strict/lenient) |
| Assertion | Custom Python expressions (output > 0, output == input * 2) |
| Duration | Performance constraints (function-level & case-level) |
| Pattern | Regex validation on output |
| ContainsInput | Verify output contains the input |
| Raises | Exception class + optional message matching |
| LLMJudge | AI-powered rubric evaluation |
factorial:
evals:
Assertion:
assertion: "output > 0"
Type:
type: "int"
Duration:
duration: 1.0
dataset:
- case: { input: 0, expected: 1 }
- case: { input: 5, expected: 120 }Full reference: docs/EVALUATORS.md
Inject databases, temp files, caches into functions under test. Three patterns: generator (yield), tuple (setup/teardown), simple (setup only).
fixtures:
db:
setup: myapp.setup_db
teardown: myapp.close_db
scope: module
query_user:
fixture: [db]
dataset:
- case:
inputs: { user_id: 1 }
expected: { name: "Alice" }def query_user(user_id: int, *, db: dict) -> dict | None:
return db["users"].get(user_id)Full reference: docs/FIXTURES.md
Transform YAML inputs into Pydantic models, dates, or custom types:
summary = (
RunEvals.from_file("evals.yml")
.with_functions({"get_user": get_user})
.with_serializer({"get_user": User}) # Schema mode
.run()
)Full reference: docs/SERIALIZERS.md
from vowel import EvalGenerator, Function
generator = EvalGenerator(model="openai:gpt-4o", load_env=True)
func = Function.from_callable(my_function)
result = generator.generate_and_run(func, auto_retry=True, heal_function=True)
print(f"Coverage: {result.summary.coverage * 100:.1f}%")from vowel.tdd import TDDGenerator
generator = TDDGenerator(model="gemini-3-flash-preview", load_env=True)
result = generator.generate_all(
description="Binary search for target in sorted list. Returns index or -1.",
name="binary_search"
)
result.print() # Shows: signature → tests → code → resultsStep-by-step control:
description = "Calculate factorial of a non-negative integer"
signature = generator.generate_signature(description=description, name="factorial")
runner, yaml_spec = generator.generate_evals_from_signature(signature, description=description)
func = generator.generate_implementation(signature, yaml_spec, description=description)
summary = runner.with_functions({"factorial": func.impl}).run()Full reference: docs/AI_GENERATION.md
Expose vowel's capabilities to AI assistants like Claude Desktop via Model Context Protocol.
Setup guide: docs/MCP.md
vowel evals.yml # Run single file
vowel -d ./tests # Run directory
vowel evals.yml -f add,divide # Filter functions
vowel evals.yml --ci --cov 90 # CI mode
vowel evals.yml --watch # Watch mode
vowel evals.yml --dry-run # Show plan without running
vowel evals.yml --export-json out.json # Export results
vowel evals.yml -v # Verbose summary
vowel evals.yml -v --hide-report # Verbose, hide pydantic_evals reportFull reference: docs/CLI.md
summary = run_evals("evals.yml", functions={...})
summary.all_passed # bool
summary.success_count # int
summary.failed_count # int
summary.total_count # int
summary.coverage # float (0.0-1.0)
summary.failed_results # list[EvalResult]
summary.meets_coverage(0.9) # Check threshold
summary.print() # Rich formatted output
summary.to_json() # Export as dict
summary.xml() # Export as XML| Document | Description |
|---|---|
| YAML Spec | Complete YAML format reference |
| Evaluators | All 8 evaluator types |
| Fixtures | Dependency injection guide |
| Serializers | Input serializer patterns |
| AI Generation | EvalGenerator & TDDGenerator |
| CLI | Command-line reference |
| MCP Server | AI assistant integration |
| Troubleshooting | Common errors & solutions |
Apache License 2.0