Organization

Each evaluation lives in its own directory with an EVAL.yaml entry point.

Structure

evals/
├── code-review/
│   ├── EVAL.yaml
│   ├── prompts/
│   │   └── quality.md
│   └── judges/
│       └── syntax.py
├── document-extraction/
│   ├── EVAL.yaml
│   └── fixtures/
│       └── sample.pdf
└── rag-accuracy/
    ├── EVAL.yaml
    └── dataset.jsonl

Each eval directory contains:

File/Directory	Purpose
`EVAL.yaml`	Evaluation definition (required)
`prompts/`	LLM judge prompts
`judges/`	Code judge scripts
`fixtures/`	Test data and samples
`dataset.jsonl`	Large test datasets

Discovery

Tools discover evaluations by finding EVAL.yaml files:

agentv list
agentv eval "evals/**/EVAL.yaml"

Shared Resources

Place shared prompts or judges alongside the evals directory:

project/
├── evals/
│   └── code-review/
│       └── EVAL.yaml
└── shared/
    ├── prompts/
    │   └── safety.md
    └── judges/
        └── format_checker.py

Reference with absolute paths from the repo root:

assert:
  - name: safety
    type: llm_judge
    prompt: /shared/prompts/safety.md

Configuration

.agentv/targets.yaml

targets:
  - name: default
    provider: anthropic
    model: claude-sonnet-4-20250514

  - name: powerful
    provider: anthropic
    model: claude-opus-4-20250514

.agentv/config.yaml

eval_patterns:
  - "evals/**/EVAL.yaml"

defaults:
  timeout_seconds: 300
  target: default

Next Steps

EVAL Format - File format reference
Integration - Target configuration
Patterns - Testing patterns