From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research

This repository contains the Python scripts for converting Excel-based Lab Data Templates to property-based knowledge graphs to assist lab scientists in systematically navigating the experimental space.

Publication

If you like the work and want to use it, please cite our pre-print:

Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Stie Kallesoe, C., Attwood, M., Kohler, M., Gribbon, P. and Witt, G., 2024. From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research. bioRxiv, pp.2024-07. https://doi.org/10.1101/2024.07.18.604030

More datasets and templates can be found on Zenodo:

Witt, G., Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Kallesøe, C. S., Attwood, M., Kohler, M., & Gribbon, P. (2025). Supplementary data files for manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research" [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15234457

How to use the repository?

The repository is a workflow to generate a knowledge graph from a lab data template, allowing for querying and visualizing experimental data in a meaningful and efficient manner.

Directory overview

.
├── data
│   ├── additional
│   │   ├── assessments
│   │   │   ├── FAIRplusDSM_GNA-NOW_post.pdf
│   │   │   ├── FAIRplusDSM_GNA-NOW_pre.pdf
│   │   │   └── GNA_NOW_WP1_DataSurvey_template.xlsx
│   │   └── templates
│   │       ├── AMR_DataDictionary_v03.xlsx
│   │       ├── LabDataTemplate_in-vitro_post-FAIRification.xlsx
│   │       ├── LabDataTemplate_in-vitro_pre-FAIRification.xlsx
│   │       ├── LabDataTemplate_in-vivo_post-FAIRification.xlsx
│   │       ├── LabDataTemplate_in-vivo_pre-FAIRification.xlsx
│   │       ├── Project-specific_Bacterial-strain-list_v01.xlsx
│   │       └── Project-specific_Compound-list_v01.xlsx
│   ├── exps
│   │   ├── dummy
│   │   │   ├── invitro_dummy_data.xlsx
│   │   │   ├── invivo_dummy_data.xlsx
│   │   │   ├── node_dict.json
│   │   │   ├── processed_invitro_data.tsv
│   │   │   └── processed_invivo_data.tsv
│   │   └── noso-502
│   │       ├── invitro_NBT_MIC.xlsx
│   │       ├── invivo_EMC_DF_Ec.xlsx
│   │       ├── invivo_EMC_DF_Kp.xlsx
│   │       ├── invivo_EMC_PK_Ec.xlsx
│   │       ├── node_dict.json
│   │       ├── processed_invitro_data.tsv
│   │       └── processed_invivo_data.tsv
│   └── mapping_files
│       ├── bacterial_strain.tsv
│       ├── biomaterials.tsv
│       ├── experimental_type.tsv
│       ├── gna_ontology.tsv
│       ├── medium.tsv
│       ├── result_unit.tsv
│       ├── roa.tsv
│       ├── sex.tsv
│       ├── species.tsv
│       ├── statistical_method.tsv
│       └── study_type.tsv
├── GNA-NOW Graph schema.pdf
├── LICENSE
├── README.md
├── requirements.txt
└── src
    ├── constants.py
    ├── data_preprocessing.py
    ├── main.py
    ├── nodes.py
    └── relations.py

The exps directory consists of a list of experiment directories with pre-filled templates for in vitro and in vivo studies. Here, we show examples of "dummy" datasets and a NOSO-502 (internal project data).
The mapping directory consists of ontology mapped terms of the template. This is catered towards the experiments developed and performed in the project and can be easily adapted for other use cases using the ontology service OLS.
The additional directory consists of all the pre-work done on the templates and the FAIR assessment results. This showcases the effort done before and after FAIRification.
The src directory consists of all the Python scripts required to transform the lab data template into knowledge graphs.

Prerequisite

The graph is being built on Neo4J. Hence, it is recommended that a Neo4J instance be opened in the Desktop version prior to running the scripts.

Step-by-step process

Getting the base python environment ready

git clone https://github.com/IMI-COMBINE/template2graphs.git
cd template2graphs
conda create --name=template_graph python=3.9
conda activate template_graph
pip install -r requirements.txt

Making the Lab data templates ready for graph ingestion

Ensure that all your experiments are located in a directory form under the exps folder. This way, each experiment can be cataloged into a specific directory using FAIR data management guidelines
Go to the main.py file and either change the credential details to your login details or add a new user with the credentials listed (admin_name = "template2graph", password = "gnanow2024-database") and provide admin access to the Neo4J database.

cd src
python main.py

The Neo4J graph is now populated and can be explored by the users.

Funding

This work and the authors were primarily funded by the following projects: FAIRplus (IMI 802750), COMBINE (IMI 853967), and GNA NOW (IMI 853979).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research

Publication

How to use the repository?

Directory overview

Prerequisite

Step-by-step process

Funding

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
src		src
.gitignore		.gitignore
GNA-NOW Graph schema.pdf		GNA-NOW Graph schema.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

IMI-COMBINE/template2graphs

Folders and files

Latest commit

History

Repository files navigation

From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research

Publication

How to use the repository?

Directory overview

Prerequisite

Step-by-step process

Funding

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages