Skip to content

Schema Reference

This page provides the complete JSON Schema reference for validating EVAL.yaml files.

The official schema is available at:

  • URL: https://agentevals.io/schema/eval.schema.json
  • Local: spec/schema/eval.schema.json

Add to your settings.json:

{
"yaml.schemas": {
"https://agentevals.io/schema/eval.schema.json": ["**/EVAL.yaml", "**/dataset.yaml"]
}
}
Terminal window
# Using AgentV
agentv validate ./EVAL.yaml
# Using ajv
npx ajv validate -s eval.schema.json -d EVAL.yaml
import Ajv from 'ajv';
import schema from './eval.schema.json';
const ajv = new Ajv();
const validate = ajv.compile(schema);
const evalFile = loadYaml('./EVAL.yaml');
if (!validate(evalFile)) {
console.error(validate.errors);
}
# EVAL.yaml root structure
name: string # Required: Unique identifier
version: string # Optional: Spec version (default: "1.0")
description: string # Optional: Human-readable description
metadata: object # Optional: Custom key-value pairs
execution: ExecutionConfig # Optional: Default execution settings
evalcases: Evalcase[] # Required: Array of test cases
execution:
target: string # Target provider name
timeout_seconds: integer # Max execution time (1-3600)
evaluators: Evaluator[] # Array of evaluator configs
evalcases:
- id: string # Required: Unique identifier
expected_outcome: string # Required: Success criteria
# Input (at least one required)
input: string | Message[]
input_messages: Message[]
# Output (optional)
expected_output: string | object | Message[]
expected_messages: Message[]
# Evaluation (optional)
rubrics: (string | Rubric)[]
execution: ExecutionConfig
# Metadata (optional)
description: string
conversation_id: string
note: string
metadata: object
role: string # Required: system | user | assistant | tool
content: string | ContentBlock[]
# For tool messages
tool_call_id: string
name: string
# For assistant messages
tool_calls: ToolCall[]
type: string # Required: text | file | image | json
value: any # Required: Content value
id: string # Required: Unique identifier
type: "function" # Required: Always "function"
function:
name: string # Required: Function name
arguments: string # Required: JSON string of arguments
id: string # Required: Unique identifier
expected_outcome: string # Required: What this rubric evaluates
weight: number # Optional: Scoring weight (default: 1.0)
required: boolean # Optional: Fail if not met (default: false)
score_ranges: # Optional: Analytic scoring
0: string
5: string
10: string
name: string # Required: Unique name
type: string # Required: Evaluator type
weight: number # Optional: Scoring weight
config: object # Optional: Type-specific config
type: code_judge
script: string[] # Required: Command to execute
cwd: string # Optional: Working directory
type: llm_judge
prompt: string # Required: Prompt path or inline
target: string # Optional: Judge model target
type: rubric
rubrics: (string | Rubric)[] # Required: Criteria
type: composite
evaluators: Evaluator[] # Required: Child evaluators
aggregator: Aggregator # Optional: Aggregation strategy
type: tool_trajectory
mode: string # Optional: any_order | in_order | exact
expected: ExpectedToolCall[] # Optional: Expected calls
minimums: object # Optional: Minimum counts
type: field_accuracy
fields: FieldSpec[] # Required: Fields to check
aggregation: string # Optional: weighted_average | minimum | all_or_nothing
type: execution_metrics
max_tool_calls: integer
max_llm_calls: integer
max_tokens: integer
max_input_tokens: integer
max_output_tokens: integer
max_cost_usd: number
max_duration_ms: integer
type: string # Required: weighted_average | minimum | maximum | safety_gate | all_or_nothing
weights: object # Optional: Per-evaluator weights
required: string[] # Optional: Required evaluators (safety_gate)
threshold: number # Optional: Threshold (all_or_nothing)
tool: string # Required: Tool name
args: object | "any" # Optional: Expected arguments
max_duration_ms: integer # Optional: Max duration
path: string # Required: JSON path (dot notation)
match: string # Optional: exact | contains | regex | numeric_tolerance | date
required: boolean # Optional: default false
weight: number # Optional: default 1.0
tolerance: number # Optional: For numeric_tolerance
  • 2-64 characters
  • Lowercase alphanumeric + hyphens
  • Must start with letter
  • Must end with letter or number
  • No consecutive hyphens

Valid: code-review, rag-accuracy, my-eval-2 Invalid: Code-Review, -invalid, invalid-, my--eval

  • name and evalcases at root
  • id and expected_outcome in evalcases
  • name and type in evaluators

Each evalcase must have at least one of:

  • input (shorthand)
  • input_messages (canonical)
FieldTypeConstraints
scorenumber0.0 - 1.0
weightnumber>= 0
timeout_secondsinteger1 - 3600
max_cost_usdnumber>= 0

The complete JSON Schema is available at:

  • GitHub: spec/schema/eval.schema.json
  • CDN: https://agentevals.io/schema/eval.schema.json
{
"$schema": "https://json-schema.org/draft-07/schema#",
"$id": "https://agentevals.io/schema/eval.schema.json",
"title": "AgentEvals EVAL.yaml Schema",
"type": "object",
"required": ["name", "evalcases"],
...
}