LLM Inference servers (vLLM, etc.)

By integrating the OpenAI SDK with Literal AI’s instrumentation, you can also effectively monitor message-based inference servers such as LMStudio, vLLM or HuggingFace, ensuring that you have full visibility into the performance and usage of your AI models.

from literalai import LiteralClient
lc = LiteralClient()
lc.instrument_openai()
# Example: reuse your existing OpenAI setup
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

completion = literalai_client.chat.completions.create(
  model="TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_S.gguf",
  messages=[
    {"role": "system", "content": "Always answer in rhymes."},
    {"role": "user", "content": "Introduce yourself."}
  ],
  temperature=0.7,
)

print(completion.choices[0].message)

The same works for HuggingFace messages API with

base_url="https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2/v1"

Other LLM Providers (Anthropic, etc.)Mistral AI

⌘I

Important Notice

Get Started

Application

Evaluation

Settings

Guides

Integrations

Self Hosting

More