Declarative Format
Define evaluations in simple YAML files. No complex code required.
Declarative Format
Define evaluations in simple YAML files. No complex code required.
7 Evaluator Types
Code judges, LLM judges, rubrics, composites, tool trajectory, and more.
Flexible Organization
Centralized or skill-based patterns - your choice.
Industry Standard
Based on production implementations. Built for adoption.
name: code-reviewversion: "1.0"
execution: evaluators: - name: quality type: llm_judge prompt: ./prompts/quality.md
evalcases: - id: detect-bug expected_outcome: Identifies the loop condition bug input: - role: user content: "Review this code..." rubrics: - Identifies the bug - Provides correct fix| Type | Description |
|---|---|
code_judge | Execute custom scripts for deterministic checks |
llm_judge | LLM-based semantic evaluation |
rubric | Structured criteria with weights |
composite | Combine multiple evaluators |
tool_trajectory | Validate agent tool usage |
field_accuracy | Check structured data fields |
execution_metrics | Latency, cost, token limits |
AgentV is the canonical implementation of the AgentEvals standard, providing CLI tools for running evaluations.
AgentEvals is an open specification. Contributions welcome on GitHub.