Online Evals
Automatically evaluate your LLM logs in production, monitor performance and detect issues.
From the Literal AI Platform
Automating the evaluation of your Run outputs or LLM generations can significantly help detect patterns and areas of improvement for your LLM app in production, especially with large volumes of data.
An Online Eval is composed of:
- Name: A name to identify the rule.
- Log Type: Either
Agent Run
orLLM Generation
, it’s the target to evaluate. - Sample Rate: The percentage of logs to evaluate.
- Filters: Additional conditions to selectively evaluate certain logs.
- Scorer: The scorer to use for the evaluation.
To create an Online Eval, go to the Online Evals
page and click on the +
button in the upper right corner of the table.
Create Online Eval
Once the Online Eval in place, your Runs or LLM Generations get automatically evaluated.
You can check the distribution of scores on an Online Eval’s page:
Online Eval Scores Distribution
If an Online Eval failed on a Run or LLM Generation, the Log
column will show the error message.
From the SDKs
The SDKs provide Score
creation APIs with all fields exposed.
If your metrics are code-based or combine LLM calls with arithmetic operations, like Ragas, you can directly use the SDKs to create scores from your application code.
Scores must be tied either to a Step
or a Generation
object.
The concept of Score
on a Thread
is not well-defined at this stage.
Automation of actions based on evaluation results is coming soon!