From aecee2ddaab9afbf1521adf3b987dff9c43683de Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 10:55:18 +0200 Subject: [PATCH 1/8] DOC: Reorder, rename and refine section headings in README --- README.md | 53 ++++++++++++++++++++++++----------------------------- 1 file changed, 24 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index 0e3e9da..20de329 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,4 @@ - - -1. [What is ORCA-python?](#what-is-orca-python) -2. [Installing ORCA-python](#installing-orca-python) - 1. [Installation Requirements](#installation-requirements) - 2. [Download ORCA-Python](#download-orca-python) - 3. [Algorithms Compilation](#algorithms-compilation) - 4. [Installation in Python Environnement](#installation-in-python-environnement) - 5. [Installation Testing](#installation-testing) -3. [How to use ORCA-python](#how-to-use-orca-python) - 1. [Configuration Files](#configuration-files) - 1. [general-conf](#general-conf) - 2. [configurations](#configurations) - 2. [Running an Experiment](#running-an-experiment) - - +# ORCA-python ## What is ORCA-python? @@ -25,12 +10,25 @@ as long as they are implemented in scikit-learn, or self-implemented following c In this README, we will explain how to use ORCA-python, and what you need to install in order to run it. A Jupyter notebook is also avaible in [spanish](https://github.com/ayrna/orca-python/blob/master/doc/spanish_user_manual.md). +## Table of Contents -# Installing ORCA-python +- [Installation](#installation) + - [Requirements](#requirements) + - [Download ORCA-Python](#download-orca-python) + - [Setup](#setup) + - [Testing Installation](#testing-installation) +- [Quick Start](#quick-start) + - [Configuration Files](#configuration-files) + - [general-conf](#general-conf) + - [configurations](#configurations) + - [Running Experiments](#running-experiments) + + +## Installation ORCA-python has been developed and tested in GNU/Linux systems. It has been tested with Python 3.8. -## Installation Requirements +### Requirements Besides the need for the aforementioned Python interpreter, you will need to install the next Python modules in order to run an experiment (needs recent versions of scikit-learn >=1.0.0): @@ -46,7 +44,7 @@ For the installation of the modules, you may follow this [Python's Official Guid All dependencies and build configurations are managed through `pyproject.toml` file. This simplifies the setup process by allowing you to install the framework and its dependencies. -## Download ORCA-Python +### Download ORCA-Python To download ORCA-python you can simply clone this GitHub repository by using the following commands: @@ -54,7 +52,7 @@ To download ORCA-python you can simply clone this GitHub repository by using the All the contents of the repository can also be downloaded from the GitHub site by using the "Download ZIP" button. -## Installation in Python Environnement +### Setup Inside the ORCA-python root, execute the following command to install the framework along with its dependencies: `pip install .` @@ -64,7 +62,7 @@ Additionally. optional dependencies for development (e.g., black) can be install Note: The editable mode is required for running tests due to automatic dependency resolution. -## Installation Testing +### Testing Installation We provide a pre-made experiment (dataset and configuration file) to test if everything has been correctly installed.\ The way to run this test (and all experiments) is the following: @@ -74,14 +72,12 @@ The way to run this test (and all experiments) is the following: $ python config.py with orca_python/configurations/full_functionality_test.json -l ERROR ``` - -# How to use ORCA-python - +## Quick Start This tutorial uses three small datasets (balance-scale, contact-lenses and tae) contained in "datasets" folder. The datasets are already partitioned with a 30-holdout experimental design (train and test pairs for each partition). -## Configuration Files +### Configuration Files All experiments are run through configuration files, which are written in JSON format, and consist of two well differentiated sections: @@ -93,7 +89,7 @@ Each one of this sections will be inside a dictionary, having the said section n For a better understanding of the way this files works, it's better to follow an example, that can be found in: [configurations/full_functionality_test.json](https://github.com/ayrna/orca-python/blob/master/configurations/full_functionality_test.json). -### general-conf +#### general-conf ``` "general_conf": { @@ -122,7 +118,7 @@ For a better understanding of the way this files works, it's better to follow an Most of this variables do have default values (specified in [config.py](https://github.com/ayrna/orca-python/blob/master/config.py)), but "basedir" and "datasets" must always be written for the experiment to be run. Take into account, that all variable names in "general-conf" cannot be modified, otherwise the experiment will fail. -### configurations +#### configurations this dictionary will contain, at the same time, one dictionary for each configuration to try over the datasets during the experiment. This is, a classifier with some specific hyper-parameters to tune. (Keep in mind, that if two or more configurations share the same name, the later ones will be omitted) @@ -204,8 +200,7 @@ Each configuration has a name (whatever you want), and consists of: *In ensemble methods, as `OrdinalDecomposition`, you must nest another classifier (the base classifier, which doesn't have a configuration name), with it's respective parameters to tune.* - -## Running an Experiment +### Running Experiments As viewed in [Installation Testing](#installation-testing), running an experiment is as simple as executing Config.py with the python interpreter, and tell what configuration file to use for this experiment, resulting in the next command: From bb9e40dbdf84e9572a1a198bbebc6888348a2cc2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 11:15:14 +0200 Subject: [PATCH 2/8] DOC: Update What is ORCA-python? section --- README.md | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/README.md b/README.md index 20de329..d1ff141 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,7 @@ ## What is ORCA-python? -ORCA-python is an experimental framework, completely built on Python (integrated with scikit-learn and sacred modules), -that seeks to automatize the run of machine learning experiments through simple-to-understand configuration files. - -ORCA-python has been initially created to test ordinal classification, but it can handle regular classification algorithms, -as long as they are implemented in scikit-learn, or self-implemented following compatibility guidelines form scikit-learn. - -In this README, we will explain how to use ORCA-python, and what you need to install in order to run it. A Jupyter notebook is also avaible in [spanish](https://github.com/ayrna/orca-python/blob/master/doc/spanish_user_manual.md). +**ORCA-python** is an experimental framework built on Python that seamlessly integrates with scikit-learn and sacred modules to automate machine learning experiments through simple JSON configuration files. Initially designed for ordinal classification, it supports regular classification algorithms as long as they are compatible with scikit-learn, making it easy to run reproducible experiments across multiple datasets and classification methods. ## Table of Contents From 604d8b6b1149ccd21aa6f5b3baa3eb383752a7ec Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 11:23:21 +0200 Subject: [PATCH 3/8] DOC: Update Installation section --- README.md | 63 ++++++++++++++++++++++++++----------------------------- 1 file changed, 30 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index d1ff141..d328296 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,6 @@ - [Installation](#installation) - [Requirements](#requirements) - - [Download ORCA-Python](#download-orca-python) - [Setup](#setup) - [Testing Installation](#testing-installation) - [Quick Start](#quick-start) @@ -20,51 +19,49 @@ ## Installation -ORCA-python has been developed and tested in GNU/Linux systems. It has been tested with Python 3.8. - ### Requirements -Besides the need for the aforementioned Python interpreter, you will need to install the next Python modules -in order to run an experiment (needs recent versions of scikit-learn >=1.0.0): - -- numpy (tested with version 2.2.2) -- pandas (tested with version 2.2.3) -- sacred (tested with version 0.8.7) -- scikit-learn (tested with version 1.6.1) -- scipy (tested with version 1.15.1) - -To install Python, you can use the package management system you like the most.\ -For the installation of the modules, you may follow this [Python's Official Guide](https://docs.python.org/2/installing/index.html). - -All dependencies and build configurations are managed through `pyproject.toml` file. This simplifies the setup process by allowing you to install the framework and its dependencies. - -### Download ORCA-Python +ORCA-python requires Python 3.8 or higher and is tested on Python 3.8, 3.9, 3.10, and 3.11. -To download ORCA-python you can simply clone this GitHub repository by using the following commands: - - `$ git clone https://github.com/ayrna/orca-python` - -All the contents of the repository can also be downloaded from the GitHub site by using the "Download ZIP" button. +All dependencies are managed through `pyproject.toml` and include: +- numpy (>=1.24.4) +- pandas (>=2.0.3) +- sacred (>=0.8.7) +- scikit-learn (>=1.3.2) +- scipy (>=1.10.1) ### Setup -Inside the ORCA-python root, execute the following command to install the framework along with its dependencies: `pip install .` +1. **Clone the repository**: + ```bash + git clone https://github.com/ayrna/orca-python + cd orca-python + ``` -All dependencies and build configurations are managed through the `pyproject.toml` file, simplifying the installation process. FOr development or testing purposes, you can use the `--editable` option to allow modifications without reinstalling: `pip install --editable .` +2. **Install the framework**: + ```bash + pip install . + ``` -Additionally. optional dependencies for development (e.g., black) can be installed using the corresponding groups defined in the `pyproject.toml` file. For example: `pip install -e .[dev]` + For development purposes, use editable installation: + ```bash + pip install -e . + ``` -Note: The editable mode is required for running tests due to automatic dependency resolution. + Optional dependencies for development: + ```bash + pip install -e .[dev] + ``` + +> **Note:** The editable mode is required for running tests due to automatic dependency resolution. ### Testing Installation -We provide a pre-made experiment (dataset and configuration file) to test if everything has been correctly installed.\ -The way to run this test (and all experiments) is the following: +Test your installation with the provided example: - ``` - # Go to framework main folder - $ python config.py with orca_python/configurations/full_functionality_test.json -l ERROR - ``` +```bash +python config.py with orca_python/configurations/full_functionality_test.json -l ERROR +``` ## Quick Start From 5df707180a8610095498e8ce0c04d0aa5f5f2bb9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 12:00:41 +0200 Subject: [PATCH 4/8] DOC: Update Quick Start section --- README.md | 47 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index d328296..6db915d 100644 --- a/README.md +++ b/README.md @@ -65,8 +65,51 @@ python config.py with orca_python/configurations/full_functionality_test.json -l ## Quick Start -This tutorial uses three small datasets (balance-scale, contact-lenses and tae) contained in "datasets" folder. -The datasets are already partitioned with a 30-holdout experimental design (train and test pairs for each partition). +ORCA-python includes sample datasets with pre-partitioned train/test splits using a 30-holdout experimental design. + +**Basic experiment configuration:** + +```json +{ + "general_conf": { + "basedir": "orca_python/datasets/data", + "datasets": ["balance-scale", "contact-lenses", "tae"], + "hyperparam_cv_nfolds": 3, + "output_folder": "results/", + "metrics": ["ccr", "mae", "amae"], + "cv_metric": "mae" + }, + "configurations": { + "SVM": { + "classifier": "sklearn.svm.SVC", + "parameters": { + "C": [0.001, 0.1, 1, 10, 100], + "gamma": [0.1, 1, 10] + } + }, + "SVMOP": { + "classifier": "orca_python.classifiers.OrdinalDecomposition", + "parameters": { + "dtype": "ordered_partitions", + "decision_method": "frank_hall", + "base_classifier": "sklearn.svm.SVC", + "parameters": { + "C": [0.01, 0.1, 1, 10], + "gamma": [0.01, 0.1, 1, 10], + "probability": ["True"] + } + } + } + } +} +``` + +**Run the experiment:** +```bash +python config.py with my_experiment.json -l ERROR +``` + +Results are saved in `results/` folder with performance metrics for each dataset-classifier combination. The framework automatically performs cross-validation, hyperparameter tuning, and evaluation on test sets. ### Configuration Files From 33c253b2886c895429d995e6aaa1bbbd9d22ee4f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 12:10:02 +0200 Subject: [PATCH 5/8] DOC: Update Configuration Files section --- README.md | 130 +++++++----------------------------------------------- 1 file changed, 17 insertions(+), 113 deletions(-) diff --git a/README.md b/README.md index 6db915d..5f7e081 100644 --- a/README.md +++ b/README.md @@ -11,10 +11,10 @@ - [Setup](#setup) - [Testing Installation](#testing-installation) - [Quick Start](#quick-start) - - [Configuration Files](#configuration-files) - - [general-conf](#general-conf) - - [configurations](#configurations) - - [Running Experiments](#running-experiments) +- [Configuration Files](#configuration-files) + - [general-conf](#general-conf) + - [configurations](#configurations) +- [Running Experiments](#running-experiments) ## Installation @@ -111,130 +111,34 @@ python config.py with my_experiment.json -l ERROR Results are saved in `results/` folder with performance metrics for each dataset-classifier combination. The framework automatically performs cross-validation, hyperparameter tuning, and evaluation on test sets. -### Configuration Files +## Configuration Files -All experiments are run through configuration files, which are written in JSON format, and consist of two well differentiated -sections: +Experiments are defined using JSON configuration files with two main sections: general_conf for experiment settings and configurations for classifier definitions. - - **`general-conf`**: indicates basic information to run the experiment, such as the location to datasets, the names of the different datasets to run, etc. - - **`configurations`**: tells the framework what classification algorithms to apply over all the datasets, with the collection of hyper-parameters to tune. +### general-conf -Each one of this sections will be inside a dictionary, having the said section names as keys. - -For a better understanding of the way this files works, it's better to follow an example, that can be found in: [configurations/full_functionality_test.json](https://github.com/ayrna/orca-python/blob/master/configurations/full_functionality_test.json). - -#### general-conf - -``` -"general_conf": { - - "basedir": "ordinal-datasets/ordinal-regression/", - "datasets": ["tae", "balance-scale", "contact-lenses"], - "hyperparam_cv_folds": 3, - "jobs": 10, - "input_preprocessing": "std", - "output_folder": "my_runs/", - "metrics": ["ccr", "mae", "amae", "mze"], - "cv_metric": "mae" -} -``` -*note that all the keys (variable names) must be strings, while all pair: value elements are separated by commas.* +Controls global experiment parameters. +**Required parameters:** - **`basedir`**: folder containing all dataset subfolders, it doesn't allow more than one folder at a time. It can be indicated using a full path, or a relative one to the framework folder. - **`datasets`**: name of datasets that will be experimented with. A subfolder with the same name must exist inside `basedir`. + +**Optional parameters:** - **`hyperparam_cv_folds`**: number of folds used while cross-validating. - **`jobs`**: number of jobs used for GridSearchCV during cross-validation. -- **`input_preprocessing`**: type of preprocessing to apply to the data, **`std`** for standardization and **`norm`** for normalization. Assigning an empty srtring will omit the preprocessing process. +- **`input_preprocessing`**: data preprocessing (`"std"` for standardization, `"norm"` for normalization, `""` for none) - **`output_folder`**: name of the folder where all experiment results will be stored. - **`metrics`**: name of the accuracy metrics to measure the train and test performance of the classifier. - **`cv_metric`**: error measure used for GridSearchCV to find the best set of hyper-parameters. -Most of this variables do have default values (specified in [config.py](https://github.com/ayrna/orca-python/blob/master/config.py)), but "basedir" and "datasets" must always be written for the experiment to be run. Take into account, that all variable names in "general-conf" cannot be modified, otherwise the experiment will fail. - - -#### configurations - -this dictionary will contain, at the same time, one dictionary for each configuration to try over the datasets during the experiment. This is, a classifier with some specific hyper-parameters to tune. (Keep in mind, that if two or more configurations share the same name, the later ones will be omitted) - -``` -"configurations": { - "SVM": { - - "classifier": "sklearn.svm.SVC", - "parameters": { - "C": [0.001, 0.1, 1, 10, 100], - "gamma": [0.1, 1, 10] - } - }, - "SVMOP": { - - "classifier": "orca_python.classifiers.OrdinalDecomposition", - "parameters": { - "dtype": "ordered_partitions", - "decision_method": "frank_hall", - "base_classifier": "sklearn.svm.SVC", - "parameters": { - "C": [0.01, 0.1, 1, 10], - "gamma": [0.01, 0.1, 1, 10], - "probability": ["True"] - } - - } - }, - "LR": { - - "classifier": "orca_python.classifiers.OrdinalDecomposition", - "parameters": { - "dtype": ["ordered_partitions", "one_vs_next"], - "decision_method": "exponential_loss", - "base_classifier": "sklearn.linear_model.LogisticRegression", - "parameters": { - "solver": ["liblinear"], - "C": [0.01, 0.1, 1, 10], - "penalty": ["l1","l2"] - } - - } - }, - "REDSVM": { - - "classifier": "orca_python.classifiers.REDSVM", - "parameters": { - "t": 2, - "c": [0.1, 1, 10], - "g": [0.1, 1, 10], - "r": 0, - "m": 100, - "e": 0.001, - "h": 1 - } - - }, - "SVOREX": { - - "classifier": "orca_python.classifiers.SVOREX", - "parameters": { - "kernel_type": 0, - "c": [0.1, 1, 10], - "k": [0.1, 1, 10], - "t": 0.001 - } - - } -} -``` - -Each configuration has a name (whatever you want), and consists of: - -- **`classifier`**: tells the framework which classifier to use. Can be specified in two different ways: - - A relative path to the classifier in sklearn module. - - The name of a built-in class in Classifiers folder (found in the main folder of the project). -- **`parameters`**: hyper-parameters to tune, having each one of them a list of values to cross-validate (not really necessary, can be just one value). +### configurations -*In ensemble methods, as `OrdinalDecomposition`, you must nest another classifier (the base classifier, which doesn't have a configuration name), with it's respective parameters to tune.* +Defines classifiers and their hyperparameters for GridSearchCV. Each configuration has a name and consists of: +- **`classifier`**: scikit-learn path or built-in ORCA-python classifier +- **`parameters`**: hyperparameters for grid search (nested for ensemble methods) -### Running Experiments +## Running Experiments As viewed in [Installation Testing](#installation-testing), running an experiment is as simple as executing Config.py with the python interpreter, and tell what configuration file to use for this experiment, resulting in the next command: From 4502f13fba4917474f069fb0a974ac0472512587 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 12:14:49 +0200 Subject: [PATCH 6/8] DOC: Update Running Experiments section --- README.md | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 5f7e081..fe948e5 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,9 @@ - [general-conf](#general-conf) - [configurations](#configurations) - [Running Experiments](#running-experiments) - + - [Basic Usage](#basic-usage) + - [Recommended Usage](#recommended-usage) + - [Example Output](#example-output) ## Installation @@ -140,18 +142,24 @@ Defines classifiers and their hyperparameters for GridSearchCV. Each configurati ## Running Experiments -As viewed in [Installation Testing](#installation-testing), running an experiment is as simple as executing Config.py -with the python interpreter, and tell what configuration file to use for this experiment, resulting in the next command: +### Basic Usage + +```bash +python config.py with experiment_file.json +``` - `$ python config.py with experiment_file.json` +### Recommended Usage -Running an experiment this way has two problems though, one of them being an excessive verbosity from Sacred, -while the other consists of the non-reproducibility of the results of the experiment, due to the lack of a fixed seed. +For reproducible results with minimal output: -Both problems can be easily fixed. The seed can be specified after "with" in the command: +```bash +python config.py with experiment_file.json seed=12345 -l ERROR +``` - `$ python config.py with experiment_file.json seed=12345` +**Parameters:** +- `seed`: fixed random seed for reproducibility +- `-l ERROR`: reduces Sacred framework verbosity -while we can silence Sacred just by adding "-l ERROR" at the end of the line (not necessarily at the end). +### Example Output - `$ python config.py with experiment_file.json seed=12345 -l ERROR` +Results are stored in the specified output folder with detailed performance metrics and hyperparameter information for each dataset and configuration combination. From 73b10e8f40deac8369b2de3567aba5d95f50f05e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 12:17:12 +0200 Subject: [PATCH 7/8] DOC: Add License section --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index fe948e5..bb6c5f3 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ - [Basic Usage](#basic-usage) - [Recommended Usage](#recommended-usage) - [Example Output](#example-output) +- [License](#license) ## Installation @@ -163,3 +164,10 @@ python config.py with experiment_file.json seed=12345 -l ERROR ### Example Output Results are stored in the specified output folder with detailed performance metrics and hyperparameter information for each dataset and configuration combination. + +## License +[BSD 3](LICENSE) + +
+ +[Go to Top](#table-of-contents) From 441839a66478234f767803e0805fd32163e07408 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Sevilla=20Molina?= Date: Thu, 17 Jul 2025 12:29:20 +0200 Subject: [PATCH 8/8] DOC: Add badges --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index bb6c5f3..6609fc5 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,11 @@ # ORCA-python +| Overview | | +|-----------|------------------------------------------------------------------------------------------------------------------------------------------| +| **CI/CD** | [![Run Tests](https://github.com/ayrna/orca-python/actions/workflows/pr_pytest.yml/badge.svg?branch=main)](https://github.com/ayrna/orca-python/actions/workflows/pr_pytest.yml) [![!python](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue)](https://www.python.org/) | +| **Code** | [![!black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Linter: Ruff](https://img.shields.io/badge/Linter-Ruff-brightgreen?style=flat-square)](https://github.com/charliermarsh/ruff) [![License - BSD 3-Clause](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/ayrna/orca-python/blob/main/LICENSE) | + + ## What is ORCA-python? **ORCA-python** is an experimental framework built on Python that seamlessly integrates with scikit-learn and sacred modules to automate machine learning experiments through simple JSON configuration files. Initially designed for ordinal classification, it supports regular classification algorithms as long as they are compatible with scikit-learn, making it easy to run reproducible experiments across multiple datasets and classification methods.