Skip to content

Source code for paper titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research"

License

Notifications You must be signed in to change notification settings

IMI-COMBINE/template2graphs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research

This repository contains the Python scripts for converting Excel-based Lab Data Templates to property-based knowledge graphs to assist lab scientists in systematically navigating the experimental space.

Publication

If you like the work and want to use it, please cite our pre-print:

Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Stie Kallesoe, C., Attwood, M., Kohler, M., Gribbon, P. and Witt, G., 2024. From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research. bioRxiv, pp.2024-07. https://doi.org/10.1101/2024.07.18.604030

More datasets and templates can be found on Zenodo:

Witt, G., Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Kallesøe, C. S., Attwood, M., Kohler, M., & Gribbon, P. (2025). Supplementary data files for manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research" [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15234457

How to use the repository?

The repository is a workflow to generate a knowledge graph from a lab data template, allowing for querying and visualizing experimental data in a meaningful and efficient manner.

Directory overview

.
├── data
│   ├── additional
│   │   ├── assessments
│   │   │   ├── FAIRplusDSM_GNA-NOW_post.pdf
│   │   │   ├── FAIRplusDSM_GNA-NOW_pre.pdf
│   │   │   └── GNA_NOW_WP1_DataSurvey_template.xlsx
│   │   └── templates
│   │       ├── AMR_DataDictionary_v03.xlsx
│   │       ├── LabDataTemplate_in-vitro_post-FAIRification.xlsx
│   │       ├── LabDataTemplate_in-vitro_pre-FAIRification.xlsx
│   │       ├── LabDataTemplate_in-vivo_post-FAIRification.xlsx
│   │       ├── LabDataTemplate_in-vivo_pre-FAIRification.xlsx
│   │       ├── Project-specific_Bacterial-strain-list_v01.xlsx
│   │       └── Project-specific_Compound-list_v01.xlsx
│   ├── exps
│   │   ├── dummy
│   │   │   ├── invitro_dummy_data.xlsx
│   │   │   ├── invivo_dummy_data.xlsx
│   │   │   ├── node_dict.json
│   │   │   ├── processed_invitro_data.tsv
│   │   │   └── processed_invivo_data.tsv
│   │   └── noso-502
│   │       ├── invitro_NBT_MIC.xlsx
│   │       ├── invivo_EMC_DF_Ec.xlsx
│   │       ├── invivo_EMC_DF_Kp.xlsx
│   │       ├── invivo_EMC_PK_Ec.xlsx
│   │       ├── node_dict.json
│   │       ├── processed_invitro_data.tsv
│   │       └── processed_invivo_data.tsv
│   └── mapping_files
│       ├── bacterial_strain.tsv
│       ├── biomaterials.tsv
│       ├── experimental_type.tsv
│       ├── gna_ontology.tsv
│       ├── medium.tsv
│       ├── result_unit.tsv
│       ├── roa.tsv
│       ├── sex.tsv
│       ├── species.tsv
│       ├── statistical_method.tsv
│       └── study_type.tsv
├── GNA-NOW Graph schema.pdf
├── LICENSE
├── README.md
├── requirements.txt
└── src
    ├── constants.py
    ├── data_preprocessing.py
    ├── main.py
    ├── nodes.py
    └── relations.py
  • The exps directory consists of a list of experiment directories with pre-filled templates for in vitro and in vivo studies. Here, we show examples of "dummy" datasets and a NOSO-502 (internal project data).
  • The mapping directory consists of ontology mapped terms of the template. This is catered towards the experiments developed and performed in the project and can be easily adapted for other use cases using the ontology service OLS.
  • The additional directory consists of all the pre-work done on the templates and the FAIR assessment results. This showcases the effort done before and after FAIRification.
  • The src directory consists of all the Python scripts required to transform the lab data template into knowledge graphs.

Prerequisite

The graph is being built on Neo4J. Hence, it is recommended that a Neo4J instance be opened in the Desktop version prior to running the scripts.

Step-by-step process

  1. Getting the base python environment ready
git clone https://github.com/IMI-COMBINE/template2graphs.git
cd template2graphs
conda create --name=template_graph python=3.9
conda activate template_graph
pip install -r requirements.txt
  1. Making the Lab data templates ready for graph ingestion
  • Ensure that all your experiments are located in a directory form under the exps folder. This way, each experiment can be cataloged into a specific directory using FAIR data management guidelines
  • Go to the main.py file and either change the credential details to your login details or add a new user with the credentials listed (admin_name = "template2graph", password = "gnanow2024-database") and provide admin access to the Neo4J database.
cd src
python main.py

The Neo4J graph is now populated and can be explored by the users.

Funding

This work and the authors were primarily funded by the following projects: FAIRplus (IMI 802750), COMBINE (IMI 853967), and GNA NOW (IMI 853979).

About

Source code for paper titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research"

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages