Schema Reference
This page provides the complete JSON Schema reference for validating EVAL.yaml files.
Schema Location
Section titled “Schema Location”The official schema is available at:
- URL:
https://agentevals.io/schema/eval.schema.json - Local:
spec/schema/eval.schema.json
Using the Schema
Section titled “Using the Schema”In VS Code
Section titled “In VS Code”Add to your settings.json:
{ "yaml.schemas": { "https://agentevals.io/schema/eval.schema.json": ["**/EVAL.yaml", "**/dataset.yaml"] }}CLI Validation
Section titled “CLI Validation”# Using AgentVagentv validate ./EVAL.yaml
# Using ajvnpx ajv validate -s eval.schema.json -d EVAL.yamlIn Code
Section titled “In Code”import Ajv from 'ajv';import schema from './eval.schema.json';
const ajv = new Ajv();const validate = ajv.compile(schema);
const evalFile = loadYaml('./EVAL.yaml');if (!validate(evalFile)) { console.error(validate.errors);}Root Schema
Section titled “Root Schema”# EVAL.yaml root structurename: string # Required: Unique identifierversion: string # Optional: Spec version (default: "1.0")description: string # Optional: Human-readable descriptionmetadata: object # Optional: Custom key-value pairsexecution: ExecutionConfig # Optional: Default execution settingsevalcases: Evalcase[] # Required: Array of test casesExecutionConfig
Section titled “ExecutionConfig”execution: target: string # Target provider name timeout_seconds: integer # Max execution time (1-3600) evaluators: Evaluator[] # Array of evaluator configsEvalcase
Section titled “Evalcase”evalcases: - id: string # Required: Unique identifier expected_outcome: string # Required: Success criteria
# Input (at least one required) input: string | Message[] input_messages: Message[]
# Output (optional) expected_output: string | object | Message[] expected_messages: Message[]
# Evaluation (optional) rubrics: (string | Rubric)[] execution: ExecutionConfig
# Metadata (optional) description: string conversation_id: string note: string metadata: objectMessage
Section titled “Message”role: string # Required: system | user | assistant | toolcontent: string | ContentBlock[]
# For tool messagestool_call_id: stringname: string
# For assistant messagestool_calls: ToolCall[]ContentBlock
Section titled “ContentBlock”type: string # Required: text | file | image | jsonvalue: any # Required: Content valueToolCall
Section titled “ToolCall”id: string # Required: Unique identifiertype: "function" # Required: Always "function"function: name: string # Required: Function name arguments: string # Required: JSON string of argumentsRubric
Section titled “Rubric”id: string # Required: Unique identifierexpected_outcome: string # Required: What this rubric evaluatesweight: number # Optional: Scoring weight (default: 1.0)required: boolean # Optional: Fail if not met (default: false)score_ranges: # Optional: Analytic scoring 0: string 5: string 10: stringEvaluator
Section titled “Evaluator”Base Properties
Section titled “Base Properties”name: string # Required: Unique nametype: string # Required: Evaluator typeweight: number # Optional: Scoring weightconfig: object # Optional: Type-specific configcode_judge
Section titled “code_judge”type: code_judgescript: string[] # Required: Command to executecwd: string # Optional: Working directoryllm_judge
Section titled “llm_judge”type: llm_judgeprompt: string # Required: Prompt path or inlinetarget: string # Optional: Judge model targetrubric
Section titled “rubric”type: rubricrubrics: (string | Rubric)[] # Required: Criteriacomposite
Section titled “composite”type: compositeevaluators: Evaluator[] # Required: Child evaluatorsaggregator: Aggregator # Optional: Aggregation strategytool_trajectory
Section titled “tool_trajectory”type: tool_trajectorymode: string # Optional: any_order | in_order | exactexpected: ExpectedToolCall[] # Optional: Expected callsminimums: object # Optional: Minimum countsfield_accuracy
Section titled “field_accuracy”type: field_accuracyfields: FieldSpec[] # Required: Fields to checkaggregation: string # Optional: weighted_average | minimum | all_or_nothingexecution_metrics
Section titled “execution_metrics”type: execution_metricsmax_tool_calls: integermax_llm_calls: integermax_tokens: integermax_input_tokens: integermax_output_tokens: integermax_cost_usd: numbermax_duration_ms: integerAggregator
Section titled “Aggregator”type: string # Required: weighted_average | minimum | maximum | safety_gate | all_or_nothingweights: object # Optional: Per-evaluator weightsrequired: string[] # Optional: Required evaluators (safety_gate)threshold: number # Optional: Threshold (all_or_nothing)ExpectedToolCall
Section titled “ExpectedToolCall”tool: string # Required: Tool nameargs: object | "any" # Optional: Expected argumentsmax_duration_ms: integer # Optional: Max durationFieldSpec
Section titled “FieldSpec”path: string # Required: JSON path (dot notation)match: string # Optional: exact | contains | regex | numeric_tolerance | daterequired: boolean # Optional: default falseweight: number # Optional: default 1.0tolerance: number # Optional: For numeric_toleranceValidation Rules
Section titled “Validation Rules”Name Format
Section titled “Name Format”- 2-64 characters
- Lowercase alphanumeric + hyphens
- Must start with letter
- Must end with letter or number
- No consecutive hyphens
Valid: code-review, rag-accuracy, my-eval-2
Invalid: Code-Review, -invalid, invalid-, my--eval
Required Fields
Section titled “Required Fields”nameandevalcasesat rootidandexpected_outcomein evalcasesnameandtypein evaluators
Input Requirements
Section titled “Input Requirements”Each evalcase must have at least one of:
input(shorthand)input_messages(canonical)
Type Constraints
Section titled “Type Constraints”| Field | Type | Constraints |
|---|---|---|
score | number | 0.0 - 1.0 |
weight | number | >= 0 |
timeout_seconds | integer | 1 - 3600 |
max_cost_usd | number | >= 0 |
Full Schema
Section titled “Full Schema”The complete JSON Schema is available at:
- GitHub:
spec/schema/eval.schema.json - CDN:
https://agentevals.io/schema/eval.schema.json
{ "$schema": "https://json-schema.org/draft-07/schema#", "$id": "https://agentevals.io/schema/eval.schema.json", "title": "AgentEvals EVAL.yaml Schema", "type": "object", "required": ["name", "evalcases"], ...}Next Steps
Section titled “Next Steps”- Glossary - Terminology
- EVAL Format - Usage guide
- Examples - Example files