Organization
Each evaluation lives in its own directory with an EVAL.yaml entry point.
Structure
Section titled “Structure”evals/├── code-review/│ ├── EVAL.yaml│ ├── prompts/│ │ └── quality.md│ └── judges/│ └── syntax.py├── document-extraction/│ ├── EVAL.yaml│ └── fixtures/│ └── sample.pdf└── rag-accuracy/ ├── EVAL.yaml └── dataset.jsonlEach eval directory contains:
| File/Directory | Purpose |
|---|---|
EVAL.yaml | Evaluation definition (required) |
prompts/ | LLM judge prompts |
judges/ | Code judge scripts |
fixtures/ | Test data and samples |
dataset.jsonl | Large test datasets |
Discovery
Section titled “Discovery”Tools discover evaluations by finding EVAL.yaml files:
agentv listagentv eval "evals/**/EVAL.yaml"Shared Resources
Section titled “Shared Resources”Place shared prompts or judges alongside the evals directory:
project/├── evals/│ └── code-review/│ └── EVAL.yaml└── shared/ ├── prompts/ │ └── safety.md └── judges/ └── format_checker.pyReference with absolute paths from the repo root:
assert: - name: safety type: llm_judge prompt: /shared/prompts/safety.mdConfiguration
Section titled “Configuration”.agentv/targets.yaml
Section titled “.agentv/targets.yaml”targets: - name: default provider: anthropic model: claude-sonnet-4-20250514
- name: powerful provider: anthropic model: claude-opus-4-20250514.agentv/config.yaml
Section titled “.agentv/config.yaml”eval_patterns: - "evals/**/EVAL.yaml"
defaults: timeout_seconds: 300 target: defaultNext Steps
Section titled “Next Steps”- EVAL Format - File format reference
- Integration - Target configuration
- Patterns - Testing patterns