LLM Eval Suite

Requirements

System

Python3
pip 25.2

API Keys

OpenAI API Key
DeepEval API Key

Set Up

Python Virtual Environment and installing requirements

python3 -m venv --upgrade-deps <myvenv>

source <myenv>/bin/activate

pip install -r requirements.txt

Setting up Ollama

Install Ollama by downloading from
- MAC
- LINUX
- Windows

The following instructions are for mac/linux. Feel free to contribute for Windows.

Run Ollama
ollama serve
Pull a model
ollama pull llama3.1:8b
List models
ollama list

You should see llama3.1:8b listed. This is currently what the test suite uses as the model under test for the LLM As A Judge Test Suite

API Keys

To run the test suite, you will need the following API keys:

OpenAI Key
DeepEval Key

How to run test

DeepEvals

LLM AS A JUDGE

pytest -q tests/llm-as-a-judge/correctness/test_deepeval.py

How To View Results

By default, results are written to logs/deepeval_runs.jsonl (configurable via EVAL_LOG_FILE in .env). Open view_evals_results.html in a browser and use the file picker to load your JSONL results.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
data_sets		data_sets
logs		logs
tests/llm-as-a-judge/correctness		tests/llm-as-a-judge/correctness
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
view_evals_results.html		view_evals_results.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Eval Suite

Requirements

System

API Keys

Set Up

Python Virtual Environment and installing requirements

Setting up Ollama

API Keys

How to run test

DeepEvals

LLM AS A JUDGE

How To View Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Eval Suite

Requirements

System

API Keys

Set Up

Python Virtual Environment and installing requirements

Setting up Ollama

API Keys

How to run test

DeepEvals

LLM AS A JUDGE

How To View Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages