Skip to content

Organization

Each evaluation lives in its own directory with an EVAL.yaml entry point.

evals/
├── code-review/
│ ├── EVAL.yaml
│ ├── prompts/
│ │ └── quality.md
│ └── judges/
│ └── syntax.py
├── document-extraction/
│ ├── EVAL.yaml
│ └── fixtures/
│ └── sample.pdf
└── rag-accuracy/
├── EVAL.yaml
└── dataset.jsonl

Each eval directory contains:

File/DirectoryPurpose
EVAL.yamlEvaluation definition (required)
prompts/LLM judge prompts
judges/Code judge scripts
fixtures/Test data and samples
dataset.jsonlLarge test datasets

Tools discover evaluations by finding EVAL.yaml files:

Terminal window
agentv list
agentv eval "evals/**/EVAL.yaml"

Place shared prompts or judges alongside the evals directory:

project/
├── evals/
│ └── code-review/
│ └── EVAL.yaml
└── shared/
├── prompts/
│ └── safety.md
└── judges/
└── format_checker.py

Reference with absolute paths from the repo root:

assert:
- name: safety
type: llm_judge
prompt: /shared/prompts/safety.md
targets:
- name: default
provider: anthropic
model: claude-sonnet-4-20250514
- name: powerful
provider: anthropic
model: claude-opus-4-20250514
eval_patterns:
- "evals/**/EVAL.yaml"
defaults:
timeout_seconds: 300
target: default