> ## Documentation Index
> Fetch the complete documentation index at: https://docs.devic.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Automatic Evaluations

> Analyze your agents' performance through automatic evaluations generated by Devic, based on the LLM-as-Judge approach.

# Agent Performance

Devic includes an **automatic performance evaluation system** that analyzes agent behavior at the end of each execution.\
This module applies predefined metrics that allow you to objectively measure accuracy, planning, execution, and task completion.

<img src="https://mintcdn.com/devic/GaXx4bIb3UxCHJyI/evaluations.png?fit=max&auto=format&n=GaXx4bIb3UxCHJyI&q=85&s=5599c6465ef93ac7e128e260f85d14fb" alt="Automatic execution evaluation panel" width="1915" height="983" data-path="evaluations.png" />

***

## Predefined Evaluations

The evaluations come **preconfigured by default** on the platform and include key indicators of agent performance:

| Indicator                 | Description                                                                |
| ------------------------- | -------------------------------------------------------------------------- |
| **Instruction Following** | Evaluates the degree to which the agent follows the provided instructions. |
| **Task Planning**         | Measures the quality and coherence of task planning.                       |
| **Task Execution**        | Analyzes the accuracy and consistency of execution.                        |
| **Tool Usage**            | Evaluates the efficient use of the available tools.                        |
| **Finalization**          | Verifies that the agent properly completes the workflow.                   |

Each metric is scored on a **0 to 10** scale, generating an **Overall Score** that summarizes the execution’s overall performance.

***

## Custom Evaluations

In addition to predefined metrics, you can **create your own custom evaluations** to adapt them to your organization's specific goals or criteria.

These configurations are managed in the section:

**Other Options → Evaluation Configuration**

👉 [View custom evaluation configuration](https://devic.mintlify.app/devic/agents/other-options)

There you can define new criteria, adjust weights, or incorporate additional indicators according to your operational needs.

***

## LLM as Judge

Devic implements the **LLM-as-Judge** approach, where **an additional language model acts as the evaluator** of the agent's performance.\
This model analyzes the generated results, interprets the coherence of actions, and issues a score based on defined criteria.

Thanks to this system, evaluations are:

* **Objective**, as they come from an evaluator external to the executing agent.
* **Consistent**, applying the same analysis rules in every execution.
* **Automated**, eliminating the need for manual review.
* **Explanatory**, providing interpretative summaries that describe strengths and areas for improvement.

<img src="https://mintcdn.com/devic/GaXx4bIb3UxCHJyI/evaluations-detail.png?fit=max&auto=format&n=GaXx4bIb3UxCHJyI&q=85&s=f4e15c5cabcd80f66888375199f6c75c" alt="Evaluation with overall score and performance summary" width="1915" height="983" data-path="evaluations-detail.png" />

***

## Result Interpretation

The evaluation panel displays a detailed summary that includes:

* **Overall Performance:** general rating (for example, *Excellent*, *Good*, *Needs Improvement*).
* **Summary:** textual analysis generated by the evaluating model with observations on performance.
* **Strong Areas:** number of highlighted strengths.
* **Areas to Improve:** number of improvement points detected.

Additionally, the **“Get Suggestions”** button allows you to request automatic recommendations to optimize agent behavior in future executions.

***

<Note>
  Devic’s automatic evaluation system combines the precision of quantitative analysis with the qualitative interpretation of a language model, providing a comprehensive view of agent performance.
</Note>

***

## Next Steps

<Card title="Costs" icon="coins" href="/devic/agents/continuous-optimization/costs">
  Monitor token consumption, analyze execution costs, and optimize model and resource usage.
</Card>
