bioRxiv pre-print here
DeepShape is a deep convolutional neural network designed to predict molecular phenotypes from DNA sequences. Unlike traditional models that rely solely on one-hot encoded DNA sequences, DeepShape integrates DNA structural attributes indicative of local shape: minor groove width (MGW), helical twist (HelT), propeller twist (ProT), roll, and electrostatic potential (EP). This combination enhances the interpretability of the model and helps identify regulatory patterns that are not apparent from sequence information alone.
DeepShape is built upon DeeperDeepSEA, a PyTorch-based deep learning model originally designed to predict chromatin features from DNA sequence alone as implemented in Selene.
The environment.yaml file provided in this repository contains the dependencies required to run DeepShape.
Create the new conda environment:
conda env create -f environment.yamlActivate the conda environment:
conda activate dnashapeenvOnce the environment is activated, you will be ready to run DeepShape with all necessary dependencies installed.
The utils directory holds essential scripts and helper files needed to run DeepShape. Ensure the following are present in utils:
run_deepshape.py: The main script to run DeepShape.shape_fasta.py: A helper script for processing FASTA files.genome_shape_hdf5: Directory containing helper scripts for processing genome shape data.intervals_sampler_hdf5: Directory containing helper scripts for sampling.
The model directory contains the DeepShape model implementation:
deepshape.py
Ensure your configuration file is prepared, including all necessary parameters and paths. An example is available in config.
To run the DeepShape model using the run_deepshape.py script, execute the following command in your terminal:
python utils/run_deepshape.py /ABSOLUTE/PATH/config/train_deepshape.yml