From f05e2574a6460b0632906f39c7760ded17827692 Mon Sep 17 00:00:00 2001 From: rileyh Date: Tue, 29 Apr 2025 15:27:55 +0000 Subject: [PATCH 1/2] [#209] Remove notes about XGBoost being unstable XGBoost recently went to version 3.0 and stabilized the Spark integration. There don't seem to be any changes since 2.0 that affect hlink, so I think that we're good to just remove these notes and stabilize the XGBoost feature. --- README.md | 4 ---- hlink/linking/core/classifier.py | 2 +- sphinx-docs/models.md | 4 ++-- 3 files changed, 3 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 3092f3a..38b54c1 100755 --- a/README.md +++ b/README.md @@ -72,10 +72,6 @@ After installing the dependencies for one or both of these algorithms, you can use them as model types in training and model exploration. You can read more about these models in the hlink documentation [here](https://hlink.docs.ipums.org/models.html). -*Note: The XGBoost-PySpark integration provided by the xgboost Python package is -currently unstable. So the hlink xgboost support is experimental and may change -in the future.* - ## Docs The documentation site can be found at [hlink.docs.ipums.org](https://hlink.docs.ipums.org). diff --git a/hlink/linking/core/classifier.py b/hlink/linking/core/classifier.py index bb27123..b58780a 100644 --- a/hlink/linking/core/classifier.py +++ b/hlink/linking/core/classifier.py @@ -134,7 +134,7 @@ def choose_classifier(model_type: str, params: dict[str, Any], dep_var: str): elif model_type == "xgboost": if not _xgboost_available: raise ModuleNotFoundError( - "To use the experimental 'xgboost' model type, you need to install " + "To use the 'xgboost' model type, you need to install " "the xgboost library and its dependencies. Try installing hlink with " "the xgboost extra:\n\n pip install hlink[xgboost]" ) diff --git a/sphinx-docs/models.md b/sphinx-docs/models.md index 31c9eb6..ad4739a 100644 --- a/sphinx-docs/models.md +++ b/sphinx-docs/models.md @@ -121,8 +121,8 @@ maxBins = 6 XGBoost is an alternate, high-performance implementation of gradient boosting. It uses [xgboost.spark.SparkXGBClassifier](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.spark.SparkXGBClassifier). -Since the XGBoost-PySpark integration which the xgboost Python package provides -is currently unstable, support for the xgboost model type is disabled in hlink +Since the XGBoost-PySpark integration requires some additional Python packages, +support for the xgboost model type is disabled in hlink by default. hlink will stop with an error if you try to use this model type without enabling support for it. To enable support for xgboost, install hlink with the `xgboost` extra. From d1ec989b4d0b81580e63b0bb1364096bc0cd0f31 Mon Sep 17 00:00:00 2001 From: rileyh Date: Tue, 29 Apr 2025 15:38:33 +0000 Subject: [PATCH 2/2] [#209] Update the changelog --- sphinx-docs/changelog.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/sphinx-docs/changelog.md b/sphinx-docs/changelog.md index 77814e9..a5f261d 100644 --- a/sphinx-docs/changelog.md +++ b/sphinx-docs/changelog.md @@ -21,6 +21,11 @@ Hlink adheres to semantic versioning as much as possible. invoked by `select_column_mapping` when the configuration calls for them. [PR #207][pr207] +### Changed + +* Stabilized the XGBoost feature, since the integration provided by the xgboost + Python package is no longer unstable. [PR #219][pr219] + ### Deprecated * The `hlink.linking.core.transforms.apply_transform` function, which applies @@ -422,6 +427,7 @@ and false negative data in model exploration. [PR #1][pr1] [pr207]: https://github.com/ipums/hlink/pull/207 [pr212]: https://github.com/ipums/hlink/pull/212 [pr213]: https://github.com/ipums/hlink/pull/213 +[pr219]: https://github.com/ipums/hlink/pull/219 [household-matching-docs]: config.html#household-matching [household-training-docs]: config.html#household-training-and-model-exploration