From d6cfb5e2758483fb989a9b1c2165bafa677f1868 Mon Sep 17 00:00:00 2001 From: Michael Tran Date: Tue, 26 Aug 2025 11:25:18 -0400 Subject: [PATCH 1/2] feat: Add ORCESTRA and Snakemake general pages --- .../disciplines/Data_Science/Data_Curation/.pages | 4 +++- .../Data_Science/Data_Curation/orcestra.md | 8 ++++++++ .../Data_Science/Data_Curation/snakemake.md | 15 +++++++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-) create mode 100644 docs/disciplines/Data_Science/Data_Curation/orcestra.md create mode 100644 docs/disciplines/Data_Science/Data_Curation/snakemake.md diff --git a/docs/disciplines/Data_Science/Data_Curation/.pages b/docs/disciplines/Data_Science/Data_Curation/.pages index 6b20de062..25b10580f 100644 --- a/docs/disciplines/Data_Science/Data_Curation/.pages +++ b/docs/disciplines/Data_Science/Data_Curation/.pages @@ -2,5 +2,7 @@ title: Data Curation nav: - index.md + - snakemake.md + - orcestra.md - IO_Clinical_Trial_Curation - - Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation \ No newline at end of file + - Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation diff --git a/docs/disciplines/Data_Science/Data_Curation/orcestra.md b/docs/disciplines/Data_Science/Data_Curation/orcestra.md new file mode 100644 index 000000000..5dfcfe4f7 --- /dev/null +++ b/docs/disciplines/Data_Science/Data_Curation/orcestra.md @@ -0,0 +1,8 @@ +# ORCESTRA + +ORCESTRA is a platform that enables the creation of reproducible data objects for biomedical research. It integrates data from various sources, processes it using standardized pipelines, and packages it into versioned data objects that can be easily shared and cited. + +## See Also + +- [Snakemake](snakemake.md) +- [ORCESTRA Version Controlling](../../../software_development/Version_Control/orcestra_vc.md) diff --git a/docs/disciplines/Data_Science/Data_Curation/snakemake.md b/docs/disciplines/Data_Science/Data_Curation/snakemake.md new file mode 100644 index 000000000..0387e40e6 --- /dev/null +++ b/docs/disciplines/Data_Science/Data_Curation/snakemake.md @@ -0,0 +1,15 @@ +# Snakemake + +[Snakemake](https://snakemake.github.io) is a workflow management system that allows you to create reproducible and scalable data analysis pipelines. It is particularly useful for bioinformatics and data science projects, where complex workflows often involve multiple steps and dependencies. For more general information and tutorials, visit their [official documentation](https://snakemake.readthedocs.io/en/stable/). + +## Usage in the Lab + +Many of our internal data processing pipelines are built using Snakemake, such as the [RNA-seq Kallisto pipeline](../../Bioinformatics/Tools/RNAseq_Pipelines/kallisto.md#usage), and we also use it to run pipelines for the [ORCESTRA](orcestra.md) platform. + +Using Snakemake with the SLURM executor plugin allows us to efficiently manage and execute workflows on high-performance computing clusters, namely H4H. This is especially helpful for large-scale data processing tasks that require significant computational resources and time. + +We host many of our Snakemake workflows, such as ORCESTRA PSet processing pipelines, in our [BHKLAB_DataProcessing Github organization](https://github.com/BHKLAB-DataProcessing). + +## See Also + +- [BHKLAB H4H Website](https://bhklab.github.io/HPC4Health/) From 2d4ce8ec7ffa68f92a2adb68de7ab1acab32e8f1 Mon Sep 17 00:00:00 2001 From: Michael Tran Date: Wed, 17 Dec 2025 12:53:36 -0500 Subject: [PATCH 2/2] feat: add links to orcestra & docs --- docs/disciplines/Data_Science/Data_Curation/orcestra.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/disciplines/Data_Science/Data_Curation/orcestra.md b/docs/disciplines/Data_Science/Data_Curation/orcestra.md index 5dfcfe4f7..2a7669d73 100644 --- a/docs/disciplines/Data_Science/Data_Curation/orcestra.md +++ b/docs/disciplines/Data_Science/Data_Curation/orcestra.md @@ -1,6 +1,8 @@ # ORCESTRA -ORCESTRA is a platform that enables the creation of reproducible data objects for biomedical research. It integrates data from various sources, processes it using standardized pipelines, and packages it into versioned data objects that can be easily shared and cited. +[ORCESTRA](https://orcestra.ca) is a platform that enables the creation of reproducible data objects for biomedical research. It integrates data from various sources, processes it using standardized pipelines, and packages it into versioned data objects that can be easily shared and cited. + +You can read the official documentation and learn more about the platform on the [ORCESTRA Documentation Page](https://orcestra.ca/app/documentation/overview). ## See Also