Install the package (Python >= 3.8):
pip install .
For editable installs during development:
pip install -e .
Set up secrets either for Azure API or OpenAI API:
export OPENAI_AZURE_ENDPOINT=
export OPENAI_AZURE_KEY=
or
export OPENAI_API_KEY=
It assumes two files with the same number of lines. It prints the score for each line pair:
gemba --source=source.txt --hypothesis=hypothesis.txt --source_lang=English --target_lang=Czech --method="GEMBA-MQM" --model="gpt-4"
# or
python -m gemba --source=source.txt --hypothesis=hypothesis.txt --source_lang=English --target_lang=Czech --method="GEMBA-MQM" --model="gpt-4"
The main recommended methods: GEMBA-MQM and GEMBA-DA with the model gpt-4.
Get mt-metric-eval and download resources:
git clone https://github.com/google-research/mt-metrics-eval.git
cd mt-metrics-eval
pip install .
alias mtme='python3 -m mt_metrics_eval.mtme'
mtme --download
cd ..
mv ~/.mt-metrics-eval/mt-metrics-eval-v2 mt-metrics-eval-v2
Collect data and run the scorer
python gemba_da.py
export PYTHONPATH=mt-metrics-eval:$PYTHONPATH
python evaluate.py
GEMBA code and data are released under the CC BY-SA 4.0 license.
You can read more about GEMBA-DA in our arXiv paper or GEMBA-MQM in our arXiv paper.
@inproceedings{kocmi-federmann-2023-gemba-mqm,
title = {GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4},
author = {Kocmi, Tom and Federmann, Christian},
booktitle = "Proceedings of the Eighth Conference on Machine Translation",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
}
@inproceedings{kocmi-federmann-2023-large,
title = "Large Language Models Are State-of-the-Art Evaluators of Translation Quality",
author = "Kocmi, Tom and Federmann, Christian",
booktitle = "Proceedings of the 24th Annual Conference of the European Association for Machine Translation",
month = jun,
year = "2023",
address = "Tampere, Finland",
publisher = "European Association for Machine Translation",
url = "https://aclanthology.org/2023.eamt-1.19",
pages = "193--203",
}