This repository contains the code and data for the paper "Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features".
The paper is currently under review on Algorithms for Molecular Biology. A preprint version is available on bioRxiv.
The EmbeddingsAndKernels folder contains separate scripts for computing the embeddings used in the paper.
Each script is designed to be run independently, and they can be executed in any order. Each script will load the metabolic pathways dataset (an example dataset is provided in the data folder -- more info below) and compute the embeddings or kernels, saving the results in a pickle file.
Detailed instructions for each embedding method are provided in the Wiki.
To run the code, you need to install the following Python packages:
torch==2.7.1torch_geometric==2.6.1hypergraphx==1.7.7karateclub==1.2.1networkx==3.5scipy==1.15.3pyclustertend==1.9.0multiprocess==0.70.18
The code has been tested with Python 3.12. Preliminary experiments have shown compatibility issues with later Python versions (especially with karateclub and pyclustertend).
An example of the metabolic pathways dataset used in the paper is available in the file data/MetabolicPathways_DEMO_DATASET_Python.pkl. This file contains the metabolic pathways data in a format suitable for analysis. This example dataset is a smaller version of the dataset used in the paper (5 organisms only), and it is intended for demonstration purposes only. The full list of organisms is available as a supplementary file in the paper.
The Pickle file contains a dictionary with 'DATASET' as the key and a list of dictionaries as the value. Each dictionary in the list represents an organism and contains the following keys:
'ID': the ID of the organism'simplices_nodelabels': the hyperedge list of the organism, where each hyperedge is represented as a n-tuple of node labels (strings), with n being the number of nodes in the hyperedge.
If you use this code in your research, please cite the paper as follows:
@article {Cervellini2025.07.10.663860,
author = {Cervellini, Mattia and Sinaimeri, Blerina and Matias, Catherine and Martino, Alessio},
title = {Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features},
elocation-id = {2025.07.10.663860},
year = {2025},
doi = {10.1101/2025.07.10.663860},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/10.1101/2025.07.10.663860v3},
journal = {bioRxiv}
}