Skip to content

Spatial Transcriptomics Optimization by Resolution via Matrix-factorization

License

Notifications You must be signed in to change notification settings

denizgurarslan/STORM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 STORM — Spatial Transcriptomics Optimization by Resolution via Matrix-Factorization

STORM is a tensor-factorization framework for reconstructing missing or low-resolution gene expression in spatial transcriptomics data. It combines biologically informed regularization with dynamic λ-scaling to maintain a balanced optimization among multiple data modalities.

STORM logo

 

📂 Project Folder Structure

.
├── run.sh                              # One‑command runner (bash)
├── requirements.txt
├── src/
│   ├── main_STORM.py                   # CLI entry (used by run.sh)
│   ├── train_STORM.py                  # Training + evaluation pipeline
│   ├── compute_initial_losses.py       # Dynamic λ initialization
│   └── utils_preprocessing.py          # Gene filtering utilities
├── data/
│   ├── gene_name_interactions.npz      # STRING-like interactions (gene1, gene2, combined_score)
│   └── <SAMPLE>/
│       ├── <SAMPLE>.h5ad               # Full-resolution ground truth
│       ├── <SAMPLE>.tif                # Whole-slide image (WSI)
│       └── <DOWN>.h5ad                 # Downsampled input (e.g., MEND90_1234_0.3.h5ad)
└── output/
        └── model_results/
                └── <file_name_root>/   # Metrics + reconstructions per run

 

🚀 Quick Start

1) Install dependencies

pip install -r requirements.txt

2) Prepare data

  • Put ground truth and WSI under data/<SAMPLE>/.
  • Put the downsampled .h5ad and the WSI .tif under the same folder.

3) Run the full pipeline

./run.sh

Edit the top variables inside run.sh to change sample name, file name, or paths.

Direct python alternative:

python -m src.main_STORM ^
    --sample MEND90 ^
    --file_name MEND90_1234_0.3.h5ad ^
    --data_dir .\data ^
    --output_dir .\output ^
    --string_npz_path gene_name_interactions.npz

 

🧠 Model Details

Tensor factorization (CP) with rank R: factors A ∈ R^{I×R}, B ∈ R^{J×R}, C ∈ R^{K×R}.

Loss = weighted MSE (by mean expression within in‑tissue regions) + λ₁R₁ + λ₂R₂ + λ₃R₃ + λ₄R₄

Term Description
R₁ L2 regularization
R₂ Alignment between spatial and WSI-derived embeddings
R₃ Spatial Laplacian smoothness regularization
R₄ Gene–gene interaction regularization (STRING network)

Dynamic λ‑scaling computed at epoch 0 to balance loss terms (λᵢ ∝ WMSE / Rᵢ).

 

📤 Outputs

Saved under output/model_results/<file_name_root>/ (or outputs/... if set):

  • loss_history.csv — Total loss per epoch
  • metrics.csv — Pearson, MSE, MSE on non‑zero GT
  • <file_name_root>_predicted_expression.csv — Reconstructed expression at tissue spots
  • tensor_hat_full.npy — Full reconstructed tensor (I × J × K)
  • model_params.pt — Learned factors and projection matrix (A, B, C, U)

 

📬 Contact

Questions or suggestions? Open an issue or a pull request.

About

Spatial Transcriptomics Optimization by Resolution via Matrix-factorization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •