GitHub - harrlol/XVTools: Ex Vivo internal simple application for running things on Azure

Project Ex Vivo internal application for running various bioinformatic utilities on Microsoft compute. Set up your amulet project and workspace as guided here prior to using this tool.

Installation and Usage

git clone https://github.com/harrlol/XVTools.git && cd XVTools && pip install .
conda activate amlt10
az login

Download data from Terra to Microsoft sandbox

xvtools download terra \
  --secret sa.json \
  --gcs-dir gs://<bucket>/PRISM_pilot/analysis/ \
  --dest ./data/patient123 \
  --pattern '*filtered*matrix*' \
  --exclude '.bam,.bai,.fastq,.fastq.gz,.fq,.fq.gz'

Submit infercnv job to azure

xvtools submit infercnv-aml \
  --data ./data/patient123 \
  --out ./out/patient123 \
  --ref-group-names-str 'T_Cell, Macrophage, B_Cell' \
  --n-parallel 4 --n-threads 2 --sku 8C15

Notes

Data folder should either be 1) a folder of h5ad's, or 2) a folder of sample folders each containing the necessary infercnv files for each sample. Tree structure shown below.

1) ├── data/
   │    ├── sample1.h5ad
   │    ├── sample2.h5ad
   │    ├── sample3.h5ad
   │    └── ...

2) ├── data/
   │    ├── sample1/
   │         ├── singleCell.counts.matrix
   │         ├── cellAnnotations.txt
   │         └── gene_ordering_file.txt
   │    ├── sample2/
   │         ├── singleCell.counts.matrix
   │         ├── cellAnnotations.txt
   │         └── gene_ordering_file.txt
   │    ├── sample3/
   │         ├── ...
   │    └── ...

As shown in 2), please strictly follow the naming convention as outlined here on the InferCNV wiki.
I recommend running this in a tmux session for infercnv output to be automatically fetched to local. Please also monitor Azure UI for debugging.

Considerations on Parallelization

Each sample is allowed a default of 4 threads. When samples are abundant, set N_THREADS to a lower number so that N_THREADS * N_PARALLEL < 48.
Set N_PARALLEL close to the number of samples in your run. A high N_PARALLEL significantly boosts efficiency; on the contrary high N_THREADS with low N_PARALLEL behaves similar to single core processing.
For each job, try to maximize thread usage as much as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
R/src		R/src
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
__init__.py		__init__.py
docker-shell.sh		docker-shell.sh
dockerfile		dockerfile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation and Usage

Notes

Considerations on Parallelization

About

Uh oh!

Languages

License

harrlol/XVTools

Folders and files

Latest commit

History

Repository files navigation

Installation and Usage

Notes

Considerations on Parallelization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages