- Python3
- pip 25.2
- OpenAI API Key
- DeepEval API Key
python3 -m venv --upgrade-deps <myvenv>
source <myenv>/bin/activate
pip install -r requirements.txt
The following instructions are for mac/linux. Feel free to contribute for Windows.
- Run Ollama
ollama serve - Pull a model
ollama pull llama3.1:8b - List models
ollama list
You should see llama3.1:8b listed. This is currently what the test suite uses as the model under test for the LLM As A Judge Test Suite
To run the test suite, you will need the following API keys:
- OpenAI Key
- DeepEval Key
pytest -q tests/llm-as-a-judge/correctness/test_deepeval.py
By default, results are written to logs/deepeval_runs.jsonl (configurable via EVAL_LOG_FILE in .env). Open view_evals_results.html in a browser and use the file picker to load your JSONL results.