STORM is a tensor-factorization framework for reconstructing missing or low-resolution gene expression in spatial transcriptomics data. It combines biologically informed regularization with dynamic λ-scaling to maintain a balanced optimization among multiple data modalities.
.
├── run.sh # One‑command runner (bash)
├── requirements.txt
├── src/
│ ├── main_STORM.py # CLI entry (used by run.sh)
│ ├── train_STORM.py # Training + evaluation pipeline
│ ├── compute_initial_losses.py # Dynamic λ initialization
│ └── utils_preprocessing.py # Gene filtering utilities
├── data/
│ ├── gene_name_interactions.npz # STRING-like interactions (gene1, gene2, combined_score)
│ └── <SAMPLE>/
│ ├── <SAMPLE>.h5ad # Full-resolution ground truth
│ ├── <SAMPLE>.tif # Whole-slide image (WSI)
│ └── <DOWN>.h5ad # Downsampled input (e.g., MEND90_1234_0.3.h5ad)
└── output/
└── model_results/
└── <file_name_root>/ # Metrics + reconstructions per run
1) Install dependencies
pip install -r requirements.txt
2) Prepare data
- Put ground truth and WSI under
data/<SAMPLE>/. - Put the downsampled
.h5adand the WSI.tifunder the same folder.
3) Run the full pipeline
./run.sh
Edit the top variables inside run.sh to change sample name, file name, or paths.
Direct python alternative:
python -m src.main_STORM ^
--sample MEND90 ^
--file_name MEND90_1234_0.3.h5ad ^
--data_dir .\data ^
--output_dir .\output ^
--string_npz_path gene_name_interactions.npz
Tensor factorization (CP) with rank R: factors A ∈ R^{I×R}, B ∈ R^{J×R}, C ∈ R^{K×R}.
Loss = weighted MSE (by mean expression within in‑tissue regions) + λ₁R₁ + λ₂R₂ + λ₃R₃ + λ₄R₄
| Term | Description |
|---|---|
| R₁ | L2 regularization |
| R₂ | Alignment between spatial and WSI-derived embeddings |
| R₃ | Spatial Laplacian smoothness regularization |
| R₄ | Gene–gene interaction regularization (STRING network) |
Dynamic λ‑scaling computed at epoch 0 to balance loss terms (λᵢ ∝ WMSE / Rᵢ).
Saved under output/model_results/<file_name_root>/ (or outputs/... if set):
loss_history.csv— Total loss per epochmetrics.csv— Pearson, MSE, MSE on non‑zero GT<file_name_root>_predicted_expression.csv— Reconstructed expression at tissue spotstensor_hat_full.npy— Full reconstructed tensor (I × J × K)model_params.pt— Learned factors and projection matrix (A, B, C, U)
Questions or suggestions? Open an issue or a pull request.
