Datasets
Datasets are collections of input/expected output samples for conducting experiments and non regression tests.
A Dataset is made of items. A Dataset item has an input
, expected output
and can contain metadata
. The input, expected output and metadata of the items in a dataset should follow the same schema.
Dataset Types
There are two types of datasets in Literal AI: Key-Value and Generation.
Key-Value Datasets
Key-Value datasets can have any key-value pairs for input and expected output. This type of dataset can be used to for example store Runs of agents. An example of a Key-Value dataset item:
Generation Datasets
Generation datasets are a type of dataset that follow the OpenAI message format. An example of a Generation Dataset Item:
Create a Dataset
From File
You can upload a dataset on Literal AI from CSV (Key-Value datasets) or JSONL (Generation datasets).
To do so, go to the Datasets page and click on the +
button to create a new dataset. Then drag and drop your file to upload it.
Uploading a Key-Value Dataset
From Code
You can also create a Dataset
and populate it with items programmatically using SDKs.
Here is an example of how to create a dataset:
Now that we have a Dataset
, we can create dataset items:
From existing logs
A benefit of managing your datasets and production logs in Literal AI is that you can continuously improve your app by using the logs to create new dataset items.
Create a Dataset Item from logs
Whenever you identify a new edge case in your logs, you can create a dataset item to account for it in your experiments. By editing the expected output to match the desired output, you ensure that your future experiments will account for this particular issue.
Get a dataset
Once you have created a dataset, you can retrieve it using the SDK to use it in your experiments or your CI.
Was this page helpful?