LLM applications that can handle multimodal input as well as output can be logged with Literal AI. For example:

  • Images: Vision and Generation
  • Video: Vision and Generation
  • Audio: Speech-to-Text and Text-to-Speech
  • Other file types, like PDF files, are also supported

You can leverage multimodal capabilities on Literal AI in two ways:

  • Simple logging on API calls to Multimodal LLM APIs, like gpt-vision
  • Save multimodal files as Attachments. Image, videos, audio and other files are shown as Attachment in the Literal AI platform, which can be accessed and downloaded via a Step.

Simple logging of multimodal LLM APIs

Leverage one of the integrations and multimodal logging will be automatic. You can also use the ChatGeneration API to log the API call.

Example of a logged multimodal LLM call

Attachments

Attachments serves the purpose of saving files that are important to your LLM application, but are not sent as-is to LLM APIs. Attachments are displayed this way:

Example of attachments

Attachments API

Check this Python guide for an example of multimodal conversation logging with OpenAI and Literal AI.

Was this page helpful?