Multimodal
LLM applications that can handle multimodal input as well as output can be logged with Literal AI. For example:
- Images: Vision and Generation
- Video: Vision and Generation
- Audio: Speech-to-Text and Text-to-Speech
- Other file types, like PDF files, are also supported
You can leverage multimodal capabilities on Literal AI in two ways:
- Simple logging on API calls to Multimodal LLM APIs, like
gpt-vision
- Save multimodal files as Attachments. Image, videos, audio and other files are shown as
Attachment
in the Literal AI platform, which can be accessed and downloaded via aStep
.
Simple logging of multimodal LLM APIs
Leverage one of the integrations and multimodal logging will be automatic.
You can also use the ChatGeneration
API to log the API call.
Example of a logged multimodal LLM call
Attachments
Attachments serves the purpose of saving files that are important to your LLM application, but are not sent as-is to LLM APIs. Attachments are displayed this way:
Example of attachments
Attachments API
Check this Python guide for an example of multimodal conversation logging with OpenAI and Literal AI.
Was this page helpful?