All changes and improvements to Literal AI are listed here. For changes in the SDKs, go to the Python SDK or TypeScript SDK.

Literal AI cloud is currently compatible with:

0.0.623-beta (September 16th, 2024)

AI Evaluation rules should be re-configured to account for the switch to custom Evaluator prompts. Check out Score-based Rules to get started !

New features

  • Introduced “Run Experiment” feature on datasets
  • Enabled custom prompt for LLM-as-a-Judge evaluators
  • Enabled structured output in Playground and in prompts
  • Added Total Cost chart to dashboard (input + output tokens)

Improvements

  • Dataset table was enhanced with a side-panel Item View
  • Prompt Playground - Added keyboard shortcuts
  • Added ability to navigate from Generation to its root Run
  • Improved Step query efficiency for token count and environment filters
  • Improved queries for retrieving steps and threads
  • Optimized score management for faster edits and fewer queries
  • Enhanced worker performance with multi-threaded asynchronous step ingestion

Bug fixes

  • Prompt Playground:
    • Fixed template messages editing issues
    • Improved scroll behavior for streamed LLM generation
  • Enabled Google as a Provider in Playground
  • Updated prompt cache invalidation strategy

0.0.621-beta (September 2nd, 2024)

New features

  • We are introducing the concept of “Model Costs”, which will allow you to monitor the actual costs associated with your logged generations. You can now setup the costs for the various LLM models you use in production, including negociated prices.
  • When you exceed your monthly allowance on the free tier, you will now be notified by a pop-up and a persistent message in the sidebar.

Improvements

  • Add a settings column to the prompt versions table.
  • In the Settings page, you can now see details about a Score Schema by hovering above it.
  • Various performance improvements across the platform.

Bug fixes

  • Fixes an ingestion error when a step’s data includes Unicode NULL characters.
  • Fixes a race condition which could slow down step ingestion.

0.0.620-beta (August 27th, 2024)

New features

  • Logs now include Scores so that you can browse Human / AI evaluations of your application
  • The Dashboard includes two new tiles for recently ingested Runs & Scores, and you can jump to that data in one click.
  • You can now add descriptions to individual values in score schemas, to help both with human annotation and AI scoring
  • When setting up prompt A/B testing, you can now search prompt versions by their number

Improvements

  • API keys are now hidden by default with *** in Settings / LLM. Enjoy screen-sharing with your friends!
  • Improved speed ingestion for Step means faster ways to troubleshoot your application.
  • Identify the origin of Generations at a glance by checking out the Prompt name & version column in the Logs / Generations table. We have also added the much needed Output column!
  • Domain Experts can now browse Settings to better intuit their restricted permissions.
  • Thread details now show a more discoverable Scores section, right underneath Tags.

Bug fixes

  • Fixed a bug where you couldn’t select a tag when searching for it in the Filters UI
  • Fixed a bug with query refreshing
  • Solved an issue with the registration of clients and products on Stripe

0.0.617-beta (August 12th, 2024)

New Features

  • Prompt A/B testing (replaces the champion system)

Improvements

  • The UI now supports any number of tags (previously capped at 100)
  • Annotation queues, Datasets and Logs are linked more naturally traversable directly from the UI.
  • You can now edit a step before adding it to a Dataset from an Annotation Queue

Bug Fixes

  • Fixed multiple bugs in the Annotation Queues
  • Domain Experts are now able to add items to a dataset
  • Redis connection should no longer hang

0.0.615-beta (July 31th, 2024)

New features

  • Annotation Queues: you can now collaborate as a team and assign steps for review
  • Environments allow you to silo experiment, development, staging and production logs
  • Experiments:
    • Faster bootstrap to launch experiments without the need to link to a Dataset
    • Experiment items are now sortable by score in a leaderboard fashion for easier comparison
    • You can now troubleshoot your experiment items by visualizing the experiment runs as full traces
  • Generations: you can now enrich your logged LLM calls with metadata

Improvements

  • Work with your own LLM endpoint by configuring a Custom LLM provider in your settings - based on OpenAI’s messages API
  • Quickly identify your champion version in the Prompt versions table with a star icon
  • Swiftly re-use your Playground prompts in code by using the new Copy button on Template messages
  • If created via a Score Template, a Score gets linked to its template for traceability
  • Improved Generations table readability by displaying Input as the last message in OpenAI’s messages API
  • Unified & enhanced the user experience on score, tag and credentials creation
  • Added latest Anthropic model Claude Sonnet 3.5

Bug fixes

  • Project deletion completes successfully when experiments exist in project
  • Rule invocation now works with Azure OpenAI credentials
  • Improved Markdown rendering of Thread Chat view

0.0.613-beta (July 23rd, 2024)

New features

  • Datasets : you can now create a dataset from a CSV file
  • Onboarding : empty pages on a new project will now include code snippets and instructions to start sending data on Literal AI
  • Navigation : the sidebar has been revamped for flatter navigation between platform modules

Improvements

  • A new, tighter tree view that better displays Chain of Thought reasoning
  • Add new Run/Generation filters when creating or updating an evaluation rule : model, duration, prompt lineage, prompt version
  • Improve edition of Azure OpenAI credentials
  • Various improvements on platform deployment, both for cloud and self-hosting

Bug fixes

  • Fix a bug with Azure OpenAI in the prompt playground and other LLM calls
  • The credentials table will now refresh correctly after creting or updating an item
  • Local LLMs are now correctly handled by the prompt playground and other LLM calls
  • Fix a bug on the prompt playground where changing settings could reset the message list

0.0.612-beta (July 10th, 2024)

New features

  • When scoring a step through the platform, we now track the user who created the score
  • We are preparing the platform for the upcoming release of the Annotation Queue
  • A new chart on the dashboard shows the number of runs per day per run name
  • Upon signup, a new account will now contain a default project populated with Threads, Steps, Datasets etc…

Improvements

  • We are rolling out a new Role system. Possible roles are now as follows : Admin, AI Engineer, Domain expert
  • We have revamped the creation, edition and deletion of Rules for Online Evaluation
  • We have improved screen space management in tables, notably when displaying code previews
  • Some design tweaks on the dashboard and dark mode

Fixes

  • Fixed a bug where the generation was not correctly displayed in a run
  • Fixed a bug where some logos would not display correctly in dark mode
  • Fixed a bug where the scores API could break if no generationId was provided

0.0.611-beta (July 1st, 2024)

New features

  • Easily navigate the Runs view with arrow keys
  • You can now filter Runs/Generations by score presence
  • You can now bulk add Generations to Datasets from the UI
  • Dark theme for the diff editor, box plots and toasters
  • Added a new “Run” chart to the dashboard

Improvements

  • This version embarks the first iteration of our UI revamp
  • Images are now zoomable in the Prompt Playground

0.0.610-beta (June 24, 2024)

Improvements

  • Update the feedback button
  • Extend “Rules” table with filters and pagination
  • Update “is null” and “is not null” filters with a more explicit behavior
  • Improve the score element UI

Fixes

  • Fixed an issue where annotators could not access content
  • Fixed an issue when double-clicking on a date-picker
  • Fixed an issue related to “Generation” links

0.0.609-beta (June 17, 2024)

New features

  • Added a “maintainer” role for the project, that allows for write while preventing the user to manage the project

Improvements

  • Simplify the Generation and Step data handling
  • Rules can now be updated directly
  • In “Experiment” you can now see the diff between inputs and outputs columns
  • The navigation is improved
  • Score templates can now be accessed in the “Evaluate” section
  • When scoring with a “categorical” score, the score value now uses its name rather than the value itself
  • In the dashboards we no longer display nullish values as 0
  • Rules have now their own detail page

Fixes

  • Fixed an issue where prompt playground settings were not correctly persisted
  • Fixed an issue where some step rows were duplicated
  • Fixed an issue with the “dataset link”
  • Fixed an issue where it was not possible to select custom models in the playground
  • Fixed an issue with the “Generations” page pagination

0.0.608-beta (June 4, 2024)

New features

  • Compare feature is now available! This compare feature allows you to compare between generations.
  • We’ll be debuting in the coming week with self-service distribution of the Literal AI platform for self-hosting
  • A new user role has been added: “Annotator”, a user that can add tags and scores to the observability entities (e.g. thread, step…) and has no access outside of those.
  • Project administrator can now pick the role of a user when inviting.

Improvements

  • Literal AI API keys are now shared in the project. Before, admins could create “personal” API keys. The behavior of restricting access to admins is kept.
  • We’re continuing our push towards a more consistent - and prettier - User Experience:
    • We’ve switched to a more vibrant color scheme
    • Made some visual tweaks on the Dashboard page
    • Observability items such as Threads, Runs, and others will now display as full pages rather than side-panes
    • And lots of other improvements across the platform
  • Some changes on the way the platform is deployed, both on our end and for our on-premise users :
    • Improved and centralized environment management
    • The Portkey AI Gateway is now handled directly inside the Node process
    • The BUCKET_NAME environment variable is no longer mandatory. Trying to store objects will log errors but not disrupt the rest of the operations

Fixes

  • This week’s release sees a big focus on performance, especially on the Users and individual Thread pages
  • We’ve also chased and squashed a few bugs related to :
    • Signing Attachment URLs (for object storage like S3)
    • Conflicts on unique userId
    • A visual bug on initialization of “continuous” score templates
  • Audio attachments are now resolved.
  • Links on prompt versions are now directed to the correct prompt.
  • Show the correct projects when accessing the prompt playground.

0.0.607-beta (May 27, 2024)

New features

  • Added online evaluations to score LLM generations on the fly.
  • Created the endpoint /api/my-project to quickly access a project ID with an API key
  • Brushed up the Dashboard page with:
    • Browser-level customizable layouts of charts
    • Filters on each chart to select relevant data - also saved at the browser level
  • Specific to token usage, we offer multiple visualizations

Improvements

  • Improved the look of Step badges, specifically colors.
  • Sharing threads requires additional privilege
  • Improved UX on text, audio, image and video attachments in Step details
  • Prompt versions show a visual “Open” button to jump to the Prompt Playground
  • Revamped the UI look of the side navigation
  • Stop sequences on Prompt Playground now show visual cues
  • Removed UUID columns across tables to improve readability
  • JSON & Text previews come with full screen & copy/paste options

Fixes

  • Newly created API keys do not contain special characters

0.0.606-beta (May 20, 2024)

Breaking Changes

  • Dataset: Renamed intermediary steps expectedOutput to output. In a Dataset, in the field Indermediary Steps, the field expectedOutput is renamed to just output, because this is the actual output of the LLM. This breaks backward compatibility for users relying on DatasetItem.intermediarySteps.expectedOutput, DatasetItem.expectedOutput remains unchanged.

New Features

  • Attachments now come with preview widgets (multi-modality)

Improvements

  • A page change on tables now scrolls back to top of table
  • We removed name fallback to ID for threads, now shown as N/A
  • We reduced the indent of navigation sub-menus
  • The prompt playground now persists the credentials for your session
  • Improved user feedback options from the UI
  • JSONs in tables now display on multiple lines with syntax highlighting
  • Improved dashboard performance with data fetch in separate requests

Fixes

  • Fixed creation of attachments and scores when step doesn’t exist
  • Fixed thread duplication when filtering on errors
  • Fixed the upserts of step input/output to prevent going past size limit

0.0.605-beta (May 13, 2024)

New Features

  • Support GPT-4o as LLM model provider
  • We now display a diff of the prompt settings when saving a prompt version
  • Steps are now supporting tags

Improvements

  • We now populate the dataset item intermediarySteps when adding a step with children steps
  • The API credentials in the prompt playground are moved
  • Generation details view now has a link to prompt
  • Support higher than 1 temperature setting for compatible LLMs

Fixes

  • Fix display bugs in prompt playground
  • Fix a bug where we allow for very large json inputs
    • Now metadataare limited to 1mb
    • Step input and expectedOutput limited to 3mb
  • Fix a bug where full-text searching threads would lead to a spike in cpu usage
  • Fix a rare bug that could occur when ingesting multiple steps with a new tag

0.0.604-beta (May 6, 2024)

New Features

  • UI/UX: New button on the headers on the Literal AI platform of a page that links to the documentation. This improves the UX of the Literal AI platform.
  • Release status page. Literal AI now has a release status page: https://literalai.betteruptime.com/. Here, you can see the uptime of the services.
  • Experiments/Score: You are now able create Scores directly in Experiments.
  • Threads: There is now a search bar in the Threads table.

Improvements

  • Minor UI updates to:
    • Sidebar navigation
    • Scores table
    • Table filters
  • Tags: Pressing enter will now create a Tag
  • Warn on dataset deletion
  • Persist playground settings
  • Warn when creating a prompt

Fixes

  • Fix dashboard evolution badge tooltip period being wrong

0.0.603-beta (April 29, 2024)

New Features

  • Credentials: You are now able to share your llm credentials to better collaborate through the prompt playground.

Improvements

  • Dashboard: New comparison badge in dashboard to display the data evolution. Dashboard evolution badge
  • UI of Thread and Dataset: On the top of a Thread or Dataset page there, the location is now shown as breadcrumb. This prevents getting lost in sheets and improves navigation.
  • Settings UI: Split Settings menu in the UI into sub-menus for General, LLM and Team.
  • Prompts: New Created by column on Prompt Version table, which improves the table display.

Fixes

  • Prompt Playground: Fix model select overflow (This is minimal change in the prompt playground. The long model name in the select is now ellipsed when size is reduced)
  • Experiments: In comparison mode, parameters are made more explicit. In addition, the charts inversed, which is now fixed.
  • Filters: is null and not in filters on tags edge cases are added. This fixes the tags filters in the table, as they were not working as intended before.
  • Tags: Newly created Tags are now visible in the UI when a Thread, Step or Generation page is refreshed. Tags are now refetched on page refresh.
  • Tags: Tags can now be added on generations being created.