The following repository contains all metrails for repoducing the paper "To tune or not to tune? A meta-leaning approach for recommending important hyperparameters":
-
the scripts for collecting performance data of 6 machine learning algorithms on 200 classification tasks from OpenML environment.
-
the collected performance data of SVM, Decision Tree, Random Forest, AdaBoost, Gradient Boosting and Extra Trees Classifiers.
-
Several notebooks that each performs one experiment and conducts the results.
-
Based on PerformanceData, created new datasets that all are in output_csv folders.
-
tools for:
- Importing and modifying the collected data
- Searching correlation between the dataset metafeatures and classifier performances.
- Conducting statistical tests to compare performance of the classifiers over the tasks.
- Computing the best value for each important hyperparameter.
- Computing Wilcoxon test for verifing the result.
-
script for extracting metafeatures of the datasets
-
script for performing fANOVA on the performance data
from DataCollection.functions import *
path_to_datasets = 'Datasets/'
classification_per_algorithm(path=path_to_datasets, algorithm='DecisionTree')from fANOVA.fanova_functions import *
do_fanova(dataset_name='PerformanceData/AB_results_total.csv', algorithm='AdaBoost')from tools.metafeatures import *
extract_for_all(path_to_datasets)from Tools.database import Database
db = Database()per_dataset_acc = db.get_per_dataset_accuracies()
per_dataset_acc.head().dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| dataset | AB | ET | RF | DT | GB | SVM | |
|---|---|---|---|---|---|---|---|
| 0 | AP_Breast_Omentum.csv | 0.981060 | 0.976235 | 0.976462 | 0.973912 | 0.983555 | 0.914538 |
| 1 | AP_Breast_Prostate.csv | 0.995238 | 0.995238 | 0.995238 | 0.995238 | 0.995238 | 0.961498 |
| 2 | AP_Endometrium_Lung.csv | 0.968363 | 0.958392 | 0.957018 | 0.929240 | 0.968363 | 0.894591 |
| 3 | AP_Endometrium_Prostate.csv | 0.992857 | 0.992857 | 0.992857 | 1.000000 | 1.000000 | 0.984615 |
| 4 | AP_Endometrium_Uterus.csv | 0.854854 | 0.837953 | 0.859561 | 0.827924 | 0.860409 | 0.758801 |
metafeatures = db.get_metafeatures()