A Dataset consists of Dataset Items. A Dataset Item has an input, expected output and can contain metadata. The input, expected output and metadata of the items in a dataset should follow the same schema.

Dataset Types

There are two types of datasets in Literal AI: Key Value and Generation.

Key Value Datasets

Key Value datasets can have any key-value pairs for input and expected output. This type of dataset can be used to for example store Runs of agents. An example of a Key Value dataset item:

input = {
  "query": "Can you name a movie about space travel?"
}

expected_output = { 
  "response": "A movie about space travel is \"Interstellar\"." 
}

Generation Datasets

Generation datasets are a type of dataset that follow the OpenAI message format. An example of a Generation Dataset Item:

input = {
  "messages": [{
    "role": "system"
    "content": "You are a helpful assistant." 
  }, {
    "role": "user"
    "content": "Can you name a movie about space travel?" 
  }] 
}

expected_output = { 
  "role": "assistant",
  "content": "A movie about space travel is \"Interstellar\"." 
}

Create a Dataset

From a file

You can upload a dataset on Literal AI from a CSV file (Key Value datasets) or a JSONL file (Generation datasets).

To do so, go to the Datasets page and click on the + button to create a new dataset. Then drag and drop your file to upload it.

Uploading a Key Value Dataset

From your code

You can also create a dataset and populate it with items programmatically using the SDK.

See installation to get your API key and instantiate the SDK

Here is an example of how to create a dataset:

Now that we have a dataset, we can create dataset items:

From existing logs

A benefit of managing your datasets and production logs in Literal AI is that you can continuously improve your app by using the logs to create new dataset items.

Create a Dataset Item from logs

Whenever you identify a new edge case in your logs, you can create a dataset item to account for it in your experiments. By editing the expected output to match the desired output, you ensure that your future experiments will account for this particular issue.

Get a dataset

Once you have created a dataset, you can retrieve it using the SDK to use it in your experiments or your CI.