From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research
This repository contains the Python scripts for converting Excel-based Lab Data Templates to property-based knowledge graphs to assist lab scientists in systematically navigating the experimental space.
If you like the work and want to use it, please cite our pre-print:
Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Stie Kallesoe, C., Attwood, M., Kohler, M., Gribbon, P. and Witt, G., 2024. From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research. bioRxiv, pp.2024-07. https://doi.org/10.1101/2024.07.18.604030
More datasets and templates can be found on Zenodo:
Witt, G., Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Kallesøe, C. S., Attwood, M., Kohler, M., & Gribbon, P. (2025). Supplementary data files for manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research" [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15234457
The repository is a workflow to generate a knowledge graph from a lab data template, allowing for querying and visualizing experimental data in a meaningful and efficient manner.
.
├── data
│ ├── additional
│ │ ├── assessments
│ │ │ ├── FAIRplusDSM_GNA-NOW_post.pdf
│ │ │ ├── FAIRplusDSM_GNA-NOW_pre.pdf
│ │ │ └── GNA_NOW_WP1_DataSurvey_template.xlsx
│ │ └── templates
│ │ ├── AMR_DataDictionary_v03.xlsx
│ │ ├── LabDataTemplate_in-vitro_post-FAIRification.xlsx
│ │ ├── LabDataTemplate_in-vitro_pre-FAIRification.xlsx
│ │ ├── LabDataTemplate_in-vivo_post-FAIRification.xlsx
│ │ ├── LabDataTemplate_in-vivo_pre-FAIRification.xlsx
│ │ ├── Project-specific_Bacterial-strain-list_v01.xlsx
│ │ └── Project-specific_Compound-list_v01.xlsx
│ ├── exps
│ │ ├── dummy
│ │ │ ├── invitro_dummy_data.xlsx
│ │ │ ├── invivo_dummy_data.xlsx
│ │ │ ├── node_dict.json
│ │ │ ├── processed_invitro_data.tsv
│ │ │ └── processed_invivo_data.tsv
│ │ └── noso-502
│ │ ├── invitro_NBT_MIC.xlsx
│ │ ├── invivo_EMC_DF_Ec.xlsx
│ │ ├── invivo_EMC_DF_Kp.xlsx
│ │ ├── invivo_EMC_PK_Ec.xlsx
│ │ ├── node_dict.json
│ │ ├── processed_invitro_data.tsv
│ │ └── processed_invivo_data.tsv
│ └── mapping_files
│ ├── bacterial_strain.tsv
│ ├── biomaterials.tsv
│ ├── experimental_type.tsv
│ ├── gna_ontology.tsv
│ ├── medium.tsv
│ ├── result_unit.tsv
│ ├── roa.tsv
│ ├── sex.tsv
│ ├── species.tsv
│ ├── statistical_method.tsv
│ └── study_type.tsv
├── GNA-NOW Graph schema.pdf
├── LICENSE
├── README.md
├── requirements.txt
└── src
├── constants.py
├── data_preprocessing.py
├── main.py
├── nodes.py
└── relations.py
- The exps directory consists of a list of experiment directories with pre-filled templates for in vitro and in vivo studies. Here, we show examples of "dummy" datasets and a NOSO-502 (internal project data).
- The mapping directory consists of ontology mapped terms of the template. This is catered towards the experiments developed and performed in the project and can be easily adapted for other use cases using the ontology service OLS.
- The additional directory consists of all the pre-work done on the templates and the FAIR assessment results. This showcases the effort done before and after FAIRification.
- The src directory consists of all the Python scripts required to transform the lab data template into knowledge graphs.
The graph is being built on Neo4J. Hence, it is recommended that a Neo4J instance be opened in the Desktop version prior to running the scripts.
- Getting the base python environment ready
git clone https://github.com/IMI-COMBINE/template2graphs.git
cd template2graphs
conda create --name=template_graph python=3.9
conda activate template_graph
pip install -r requirements.txt- Making the Lab data templates ready for graph ingestion
- Ensure that all your experiments are located in a directory form under the
expsfolder. This way, each experiment can be cataloged into a specific directory using FAIR data management guidelines - Go to the main.py file and either change the credential details to your login details or add a new user with the credentials listed (admin_name = "template2graph", password = "gnanow2024-database") and provide admin access to the Neo4J database.
cd src
python main.pyThe Neo4J graph is now populated and can be explored by the users.
This work and the authors were primarily funded by the following projects: FAIRplus (IMI 802750), COMBINE (IMI 853967), and GNA NOW (IMI 853979).