CS 4641 Team 24

This is the repository for the Georgia Tech CS 4641 Fall 2025 project for team 24.

Repository Outline

/pages/: Website files
/workflows: Files for Snakemake pipeline
- /workflows/config/config.yaml: Pipeline config file
- /workflows/results/: Results/outputs from pipeline
  - /workflows/results/brain-tumor-50/: Results from local pipeline testing
  - /workflows/results/brain-tumor-dataset/: Actual results folder
- /workflows/rules/: Rule files defining Snakemake pipeline
- /workflows/scripts/: Python scripts for Snakemake pipeline
- /workflows/Snakefile: Main Snakemake entrypoint
- /workflows/setup_slurm.sh: Script setting up Snakemake command for Pace
/src/daikon: Local development package and code
- /src/daikon/eda/: Python functions and dataclasses for EDA
- /src/daikon/models/: Python functions and dataclasses for setting up ML models
- /src/daikon/preprocessing/: Python functions and dataclasses for preprocessing images
/tests/: Jupyter notebooks to test code before creating functions or adding to pipeline
/docs/: Notes from the development process
/environments/: Conda environment files for local and Pace

Notes

This repository is currently a work in progress. Read below for more information and installation instructions.

Structure

Most of the data analysis pipelines are set up using Snakemake for its Slurm integration capabilities, so a lot of the repository is structured around that. All of the workflow-related scripts are contained within the workflow/ folder, and local development code are contained in the local package daikon, which is in the src/ folder.

In the workflow/ folder, all the Snakemake workflows start from the Snakefile file, which includes all the files inside the rules/ folder. Snakemake operates on rules that determine which scripts can produce which files, and by chaining them together can produce the output files necessary. The scripts that the rules call are all inside of the scripts/ folder.

All of the results end up in the results/ folder. The rules create a folder inside of the results/ folder corresponding to the name of each dataset that is in the <project root>/data/ folder. This folder is not commited to git because it is too large. Locally, we can use a small subset of 50 images per class for testing, and on PACE we can store our images in the scratch folder (300GB capacity) and connect it to where the pipeline expects the folder to be via a symlink.

Note: The local development package is named daikon for no reason at all -- I just decided that was a good name for it.

Installation

Setup environment

To create the conda environment, run the following command from the root of the repository:

conda env create --file environments/environment.yml

Then, before running the scripts, run the following to activate the environment:

conda activate cs4641-project

Finally, from the root of the repository install the local package manually:

pip install -e .

This will install all dependencies, as well as set up the local package in edit mode so you can edit code and import it in whatever scripts you want to run.

Set up on PACE-ICE

Setting the repository to work on PACE is similar. Follow the below instructions to get started. This setup takes a bit more time, but can be beneficial if there is a lot of data to process.

Get access to PACE-ICE and use ssh to connect to PACE.
Activate Anaconda 3
```
module load anaconda3
```
Download the dataset files into the scratch folder. This folder has a larger capacity than your home folder. Make sure the dataset folder is directly in the scratch folder (your dataset should be at ~/scratch/brain-tumor-dataset/).
Clone this repository and cd into the root of the repository
Create a symlink to the scratch folder to serve as your data folder:
```
ln -s ~/scratch/ data/
```
Create the PACE environment, which is different from the normal development environment:
```
conda env create --file environment/pace.yml
```
Activate the environment
```
conda activate pace
```
To run the snakemake pipeline, run snakemake as you normally would from the workflow/ folder, but instead of calling snakemake -c 1, call ./setup_slurm.sh <output files>

For more information, check out Intro to PACE ICE

Pages

To set up pages, you need to first install bun. Then, you can run the following instructions to start website development:

Enter the pages/ directory
```
cd pages/
```
Install website dependencies
```
bun ci
```
To run the website in development mode, run
```
bun run dev
```
Once you have made your changes and want to push the website, run
```
bun run deploy
```

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
docs		docs
environments		environments
pages		pages
src/daikon		src/daikon
tests		tests
workflows		workflows
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 4641 Team 24

Repository Outline

Notes

Structure

Installation

Setup environment

Set up on PACE-ICE

Pages

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

i3ta/brain-storm

Folders and files

Latest commit

History

Repository files navigation

CS 4641 Team 24

Repository Outline

Notes

Structure

Installation

Setup environment

Set up on PACE-ICE

Pages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages