Semantics

Literal AI Platform

Log Hierarchy on Literal AI

Literal AI approaches LLM logging at three levels:

  1. Generation: Log of a single LLM call. (Generations are Steps.)
  2. Run: Trace of an Agent/Chain run, including its intermediate steps. Can contain one or multiple generations.
  3. Thread: A collection of Runs that are part of a single conversation.
Literal AI Platform

A Thread containing a run and intermediate steps in Literal AI

You can log a generation only (typically for extraction use cases), or log a run only (typically for task automation), or combine them in threads (typically for chatbots).

See installation to get your API key and instantiate the SDK

Log an LLM Generation

Generations are logged by integrations with LLM providers. They capture the prompt, completion, settings, and token latency.

Here is an example with OpenAI:

Multimodal LLM

You can leverage multimodal capabilities on Literal AI in two ways:

  • Simple logging on API calls to Multimodal LLM APIs, like gpt-4o
  • Save multimodal files as Attachments. Image, videos, audio and other files are shown as Attachment in the Literal AI platform, which can be accessed and downloaded via a Step.
A logged multimodal LLM call

Example of a logged multimodal LLM call

Log a Run

A Run represents a trace of an Agent or Chain execution, capturing all intermediate steps and actions.

Runs can be logged manually using decorators or through framework integrations such as Llama Index or LangChain.

Log a Run with Intermediate Steps

Here’s how you can log a Run with intermediate steps using Python and TypeScript:

Add Metadata and Tags to Steps

Tags and Metadata can be added to both Runs and Steps to provide additional context and facilitate filtering and categorization.

Add Attachments to Steps

You can attach files to a Run or any of its intermediate steps, which is particularly useful for multimodal use cases.

Learn More

The intermediate steps and the agent itself are logged using the Step class. You can learn more about the Step API in the following references:

Log a Thread

You can interact with an example Thread in the platform here.

It is up to the application to keep track of the thread ID and pass it to the Literal AI client. Every run logged with the same thread ID will be part of the same conversation.

Here is an example:

You can learn more about the Thread API in the following references:

Bind a Thread to a User

You can bind a thread to a user to track the user’s activity. This is useful for chatbots or any other conversational AI.

To do so, you need to use a common identifier for the user, such as an email or a user ID:

Log a Distributed Trace

Distributed Tracing Cookbook

Learn how to log distributed traces with Literal AI.

Add a Score

Scores allow you to evaluate the LLM system performance at three levels: LLM generations, Agent Runs and Conversation Threads.

Scores can be human generated (human feedback, like a thump up or down), or AI generated (hallucination evaluation for instance).

They can be visualized on the dashboard charts and used as filters.

Add a User Feedback

Correlate your LLM system to a product metric, such as conversion, churn, upsell, etc. This can be done by:

  • Adding a specific product-related score on Literal AI.
  • Sending the logged run id to your analytics system, such as PostHog or Amplitude.

Add an AI Evaluation Result

Refer to Evaluation

Fetch Existing Logs

You can fetch existing logs using the SDKs. Here is an example to fetch the last 5 threads where a user participated:

More generally, you can fetch any Literal AI object. Check out the SDKs and API reference to learn how.

On Literal AI

Filter logs

Leverage the powerful filters on Literal AI. Use these same filters to export your data using the SDKs.

Filter on existing logs

Filter on logs

Debug logged LLM generations

Replay a logged LLM generation in the Playground

Add Tags and Scores from the UI

You can add tags and scores directly from the user interface.

Add a Tag

Add a Tag to a Thread

Conclusion

Logging with Literal AI is composable and unopinionated. It can be done at different levels depending on your use case.