A Dataset is made of items. A Dataset item has an input, expected output and can contain metadata. The input, expected output and metadata of the items in a dataset should follow the same schema.

Dataset Types

There are two types of datasets in Literal AI: Key-Value and Generation.

Key-Value Datasets

Key-Value datasets can have any key-value pairs for input and expected output. This type of dataset can be used to for example store Runs of agents. An example of a Key-Value dataset item:

input = {
  "query": "Can you name a movie about space travel?"
}

expected_output = { 
  "response": "A movie about space travel is \"Interstellar\"." 
}

Generation Datasets

Generation datasets are a type of dataset that follow the OpenAI message format. An example of a Generation Dataset Item:

input = {
  "messages": [{
    "role": "system"
    "content": "You are a helpful assistant." 
  }, {
    "role": "user"
    "content": "Can you name a movie about space travel?" 
  }] 
}

expected_output = { 
  "role": "assistant",
  "content": "A movie about space travel is \"Interstellar\"." 
}

Create a Dataset

From File

You can upload a dataset on Literal AI from CSV (Key-Value datasets) or JSONL (Generation datasets).

To do so, go to the Datasets page and click on the + button to create a new dataset. Then drag and drop your file to upload it.

Uploading a Key-Value Dataset

From Code

You can also create a Dataset and populate it with items programmatically using SDKs.

See installation to get your API key and instantiate SDKs

Here is an example of how to create a dataset:

Now that we have a Dataset, we can create dataset items:

From existing logs

A benefit of managing your datasets and production logs in Literal AI is that you can continuously improve your app by using the logs to create new dataset items.

Create a Dataset Item from logs

Whenever you identify a new edge case in your logs, you can create a dataset item to account for it in your experiments. By editing the expected output to match the desired output, you ensure that your future experiments will account for this particular issue.

Get a dataset

Once you have created a dataset, you can retrieve it using the SDK to use it in your experiments or your CI.