Using ProxyLM Regressor with URIEL+

By Mason Shipton, David Anugraha, York Hay Ng

About ProxyLM

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

Abstract

Performance prediction is a method to estimate the performance of Language Models (LMs) on various Natural Language Processing (NLP) tasks, mitigating computational costs associated with model capacity and data for fine-tuning. Our paper introduces ProxyLM, a scalable framework for predicting LM performance using proxy models in multilingual tasks. These proxy models act as surrogates, approximating the performance of the LM of interest. By leveraging proxy models, ProxyLM significantly reduces computational overhead on task evaluations, achieving up to a 37.08x speedup compared to traditional methods, even with our smallest proxy models. Additionally, our methodology showcases adaptability to previously unseen languages in pre-trained LMs, outperforming the state-of-the-art performance by 1.89x as measured by root-mean-square error (RMSE). This framework streamlines model selection, enabling efficient deployment and iterative LM enhancements without extensive computational resources.

If you are interested for more information, check out the full paper.

If you use this code for your research, please cite the following work:

@inproceedings{anugraha-etal-2025-proxylm,
    title = "{P}roxy{LM}: Predicting Language Model Performance on Multilingual Tasks via Proxy Models",
    author = "Anugraha, David  and
      Winata, Genta Indra  and
      Li, Chenyue  and
      Irawan, Patrick Amadeus  and
      Lee, En-Shiun Annie",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.106/",
    pages = "1981--2011",
    ISBN = "979-8-89176-195-7",
    abstract = "Performance prediction is a method to estimate the performance of Language Models (LMs) on various Natural Language Processing (NLP) tasks, mitigating computational costs associated with model capacity and data for fine-tuning. Our paper presents ProxyLM, a scalable task- and language-agnostic framework designed to predict the performance of LMs using proxy models. These proxy models act as surrogates, approximating the performance of the LM of interest. By leveraging these proxy models, ProxyLM significantly reduces computational overhead in task evaluations, achieving up to a 37.08x speedup over traditional methods, even with our smallest proxy models. Our results across multiple multilingual NLP tasks and various robustness tests demonstrate that ProxyLM not only adapts well to previously unseen languages in pre-trained LMs, but also generalizes effectively across different datasets, outperforming the state-of-the-art by at least 1.78x in terms of root-mean-square error (RMSE)."
}

If you have any questions, you can open a GitHub Issue or send them an email.

About URIEL+

URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base

Abstract

URIEL is a knowledge base offering geographical, phylogenetic, and typological vector representations for 7970 languages. It includes distance measures between these vectors for 4005 languages, which are accessible via the lang2vec tool. Despite being frequently cited, URIEL is limited in terms of linguistic inclusion and overall usability. To tackle these challenges, we introduce URIEL+, an enhanced version of URIEL and lang2vec addressing these limitations. In addition to expanding typological feature coverage for 2898 languages, URIEL+ improves user experience with robust, customizable distance calculations to better suit the needs of the users. These upgrades also offer competitive performance on downstream tasks and provide distances that better align with linguistic distance studies.

If you are interested for more information, check out the full paper.

If you use this code for your research, please cite the following work:

@inproceedings{khan-etal-2025-uriel,
    title = "{URIEL}+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base",
    author = {Khan, Aditya  and
      Shipton, Mason  and
      Anugraha, David  and
      Duan, Kaiyao  and
      Hoang, Phuong H.  and
      Khiu, Eric  and
      Do{\u{g}}ru{\"o}z, A. Seza  and
      Lee, En-Shiun Annie},
    editor = "Rambow, Owen  and
      Wanner, Leo  and
      Apidianaki, Marianna  and
      Al-Khalifa, Hend  and
      Eugenio, Barbara Di  and
      Schockaert, Steven",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.coling-main.463/",
    pages = "6937--6952",
    abstract = "URIEL is a knowledge base offering geographical, phylogenetic, and typological vector representations for 7970 languages. It includes distance measures between these vectors for 4005 languages, which are accessible via the lang2vec tool. Despite being frequently cited, URIEL is limited in terms of linguistic inclusion and overall usability. To tackle these challenges, we introduce URIEL+, an enhanced version of URIEL and lang2vec that addresses these limitations. In addition to expanding typological feature coverage for 2898 languages, URIEL+ improves the user experience with robust, customizable distance calculations to better suit the needs of users. These upgrades also offer competitive performance on downstream tasks and provide distances that better align with linguistic distance studies."
}

If you have any questions, you can open a GitHub Issue or send them an email.

Check out ExploRIEL, the online UI for URIEL+: https://uriel-leelab.streamlit.app/

Environment

Requires Python 3.10 or later.

All dependencies are listed in the requirements/ folder.

Running ProxyLM Regressor

1. Distance Calculation

Run the following script to calculate URIEL+ distances:

python distances/calculate_distances.py

This will create two CSV files containing distances for the MT560 and NUSA language datasets.

Output files will be saved to the distances/ folder.

2. Updating Experiment CSVs

After calculating distances, run:

python distances/replace_distances.py

This updates the experiment CSV files for MT560 and NUSA with URIEL+ distances.

Updated experiment CSVs will be saved to src/proxy_regressor/csv_datasets/.

📄 Note: To add a new distance type, follow the same format used for morphological distance in distances/replace_distances.py.

3. Changing Language Features

If you add or remove language features (e.g., introducing a new feature type), open src\proxy_regressor\utils.py and update the LANG_FEATURES list to include or exclude the appropriate language features.

4. Running Experiments

MT560 Experiments (click to expand)

Random Sampling (M2M100):

python -m src.proxy_regressor.main -em random -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_mt560_m2m100.json -d mt560 -m m2m100

Random Sampling (NLLB):

python -m src.proxy_regressor.main -em random -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_mt560_nllb.json -d mt560 -m nllb

Leave-One-Language-Out (LOLO) (M2M100):

python -m src.proxy_regressor.main -em lolo -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_mt560_m2m100.json -d mt560 -m m2m100 -l all

Leave-One-Language-Out (LOLO) (NLLB):

python -m src.proxy_regressor.main -em lolo -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_mt560_nllb.json -d mt560 -m nllb -l all

Seen/Unseen (M2M100):
```
python -m src.proxy_regressor.main -em seen_unseen -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_mt560_m2m100.json -d mt560 -m m2m100
```
After running the Seen/Unseen (M2M100) command, run:
```
python unseen.py
```
This will output a text file with more readable results and will output the average standard error. NOTE: For Seen/Unseen (M2M100) experiments, take the average of test_source_rmse and test_target_rmse for the test_rmse.

NUSA Experiments (click to expand)

Random Sampling (M2M100):

python -m src.proxy_regressor.main -em random -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_nusa_m2m100.json -d nusa -m m2m100

Random Sampling (NLLB):

python -m src.proxy_regressor.main -em random -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_nusa_nllb.json -d nusa -m nllb

Leave-One-Language-Out (LOLO) (M2M100):

python -m src.proxy_regressor.main -em lolo -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_nusa_m2m100.json -d nusa -m m2m100 -l all

Leave-One-Language-Out (LOLO) (NLLB):

python -m src.proxy_regressor.main -em lolo -r xgb -rj src/proxy_regressor/regressor_configs/xgb_config_nusa_nllb.json -d nusa -m nllb -l all

📄 Note: After each experiment finishes, results are automatically saved to a .csv file. Extract the test RMSE and test SE from the CSV (you may need to average them across individual languages). Lower values indicate better performance, as RMSE measures error.

Optional

5. Determining Statistical Significance

You can test statistical significance between URIEL, URIEL+, or different URIEL versions.

Steps:

Open test.py and update the parameters at line 19 to point to the correct experiment.
Run:
```
python test.py
```
This will save the Y_test results from the experiment to a text file.
Y_pred results from the experiment are saved in a file named {dataset_name}_{model_name}_Y_pred_results.txt.
Copy both the Y_test and Y_pred values into statistical.py under the correct experiment section.
Run:
```
python statistical.py
```
This will output the p-value measuring the statistical significance between the different URIEL results.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
distances		distances
experiment_csvs		experiment_csvs
logos		logos
requirements		requirements
src		src
uriel+_experiments		uriel+_experiments
.gitignore		.gitignore
README.md		README.md
statistical.py		statistical.py
test.py		test.py
unseen.py		unseen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Using ProxyLM Regressor with URIEL+

Contents

About ProxyLM

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

Abstract

About URIEL+

URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base

Abstract

Environment

Running ProxyLM Regressor

1. Distance Calculation

2. Updating Experiment CSVs

3. Changing Language Features

4. Running Experiments

Optional

5. Determining Statistical Significance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

LeeLanguageLab/URIELPlus-ProxyLM

Folders and files

Latest commit

History

Repository files navigation

Using ProxyLM Regressor with URIEL+

Contents

About ProxyLM

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

Abstract

About URIEL+

URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base

Abstract

Environment

Running ProxyLM Regressor

1. Distance Calculation

2. Updating Experiment CSVs

3. Changing Language Features

4. Running Experiments

Optional

5. Determining Statistical Significance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages