Evaluating Context Relevancy with Ragas
The source code of this notebook can be found in the Literal AI Github Cookbooks.
This notebook shows you how to validate changes on your RAG application against context relevancy.
We rely on Ragas to evaluate that metric, to then visualize our iterative experiments in Literal AI.
First, we create a dataset from an example RAG application. Second, we evaluate the impact of a retrieval parameter change (# of contexts) on context relevancy:
- Run a RAG application
- Create a Dataset
- Experiment A - # contexts = 2
- Evaluate with Ragas
- Persist experiment to Literal AI
- Experiment B - # contexts = 1
- Evaluate with Ragas
- Persist experiment to Literal AI
Run a RAG application
Create a Chroma vector database
Import the Literal AI SDK
Create a prompt
Ask questions to RAG application
Create a Dataset
Experiment A
With # contexts = 2
Evaluate with Ragas
Prepare Ragas data samples
Run the evaluation
We will evaluate context relevancy which checks how relevant the retrieved contexts are to answer the user’s question.
The more unneeded details in the contexts, the less relevant (between 0 and 1, 0 being least relevant).
Persist experiment to Literal AI
Experiment B
With # contexts = 1
Evaluate with Ragas
We evaluate with the first context only to see how context relevancy gets impacted.
Persist experiment to Literal AI
Visualize from Literal AI Experiments !
Comparing both experiments from Literal AI, one can visualize the diff in retrieved contexts, two for experiment A versus one for experiment B.
Context relevancy captures the ratio of question-relevant facts in retrieved contexts.
When we retrieve irrelevant contexts (the two facts about the dog do not help towards answering the question), context relevancy is 1/3.
Once we limit ourselves to a single context, we retrieve exactly the one useful fact, which yields a maximum context relevancy of 1.
Comparison of Experiments
Was this page helpful?