A transformer protein language diffusion model to create all-atom IDP ensembles and IDR disordered ensembles that maintains the folded domains.
To get started, this repository must be cloned using the following command:
git clone https://github.com/THGLab/IDPForge.gitFollowing that, the working conda environment can be established in two ways.
The base environment can be built manually via the environment.yml file in the repo. To do this, run the following command:
conda env create -f environment.yml
pip install -e .Note: The default file is set to install
torch==2.5.1 and cuda==12.1for earlier GPUs (sm_60 - sm_80). Optionally, this may be changed to installtorch==2.7.1 and cuda==12.8for later generation GPUs (sm_60 - sm_120). Refer to the comments in the file for modification instructions.
This repo also requires OpenFold utilities, so that repository must be cloned in the same directory as IDPForge using the following command:
git clone https://github.com/aqlaboratory/openfold.gitOnce the repository is cloned, proceed into the openfold/openfold/resources directory and run the following code:
wget https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txtOnce this is done, proceed back to openfold/ and do the following:
-
Replace the
setup.pyprovided with either of theopenfold_setup_12.X.pyfiles found in theIDPForge/dockerfiles directorycorresponding to cuda version chosen earlier. -
Rename it as
setup.pywithin the openfold repository. -
Install it via
pip install -e ..
This makes the environment fully ready for use.
If you have issues setting up the base environment from the yml file, or if you are setting IDPForge up for use on an HPC cluster, it is recommended to follow the installation by openfold. To do this, start by cloning both repositories in the same directory.
git clone https://github.com/THGLab/IDPForge.git
git clone https://github.com/aqlaboratory/openfold.gitThen proceed into openfold/ activate the OpenFold environment using the following command:
mamba env create -n openfold_env -f environment.ymlInstall other dependencies required by IDPForge using the following command:
conda install einops mdtraj -c conda-forgeIt is also recommended to uninstall flash-attn via pip uninstall flash-attn when starting out if this installation pathway is chosen.
This makes the environment fully ready for use.
Note: For more information on OpenFold installation, please refer to the installation guide. https://openfold.readthedocs.io/en/latest/Installation.html
ESM2 utilities are refactored into this repo for network modules and exploring the effects of ESM embedding on IDP modeling. Alternatively, it can be installed from their github https://github.com/facebookresearch/esm.git, or via pip install pip install fair-esm.
Optional: pip install flash-attn==2.3 to speed up attention calculation.
IDPForge can also be built as a docker container using either of the included dockerfiles (Blackwell or Ampere). Blackwell runs on CUDA12.8 and Ampere runs on CUDA12.1. Models weights and an example training data and other inference input files can be downloaded from Figshare. Optionally, the files may be merged before the creation of the image. This will ensure the image contains the merged files, removing the need for additional /weights and /data mounting.
To build the image, run the following command from the root of this repository choosing either Blackwell or Ampere based on preference:
docker build -f dockerfiles/Dockerfile_[Blackwell/Ampere] -t idpforge:latest .To confirm that your idpforge:latest image is successfully completed, run
docker imagesTo run a container from the newly created image, run
docker run --rm -it --gpus all idpforge:latestTo verify that your docker installation is able to properly communicate with your GPU, run
docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smiOnce the image is created, outside directories can be added into a container by mounting them as follows.
docker run --rm -it --gpus all \
-v "[path-to-directory]":/app/[directory-name-in-container] \
# Optional: any other mounts... \
idpforge:latestExamples of this are given in later sections.
We use pytorch-lightning for training and one can customize training via the documented flags under trainer in the config file.
conda activate idpforge
python train.py --model_config_path configs/train.ymlWe provide a commandline interface to sample single chain IDP/IDRs.
usage: sample_idp.py [-h] [--batch BATCH] [--nconf NCONF] [--cuda]
ckpt_path output_dir sample_cfg
positional arguments:
seq protein sequence
ckpt_path path to model weights
output_dir directory to output pdbs
sample_cfg path to a sampling configuration yaml file
optional arguments:
--batch BATCH batch size
--nconf NCONF number of conformers to sample
--cuda whether to use cuda or cpu
Example to generate 100 conformers for Sic1:
mkdir test
sequence="GSMTPSTPPRSRGTRYLAQPSGNTSSSALMQGQKTPQKPSQNLVPVTPSTTKSFKNAPLLAPPNSNMGMTSPFNGLTSPQRSPFPKSSVKRT"
python sample_idp.py $sequence weights/mdl.ckpt test configs/sample.yml --nconf 100 --cuda Inference time experimental guidance can be activated by the potential flag in the configs/sample.yml. An example PREs experimental data file is also provided in data/sic1_pre_exp.txt.
This can also be run within the previously created docker image. Set the working directory to the root of the previously cloned and merged version of this repository and run the following.
mkdir test
sequence="GSMTPSTPPRSRGTRYLAQPSGNTSSSALMQGQKTPQKPSQNLVPVTPSTTKSFKNAPLLAPPNSNMGMTSPFNGLTSPQRSPFPKSSVKRT"
docker run -it --rm --gpus all \
-v "./test/":/app/output \
-v "./data/":/app/data \
-v "./weights/":/app/weights \
-w /app \
idpforge:latest \
python -u /app/sample_idp.py $sequence /app/weights/mdl.ckpt /app/output /app/configs/sample.yml --nconf 100 --cudaFirst, to prepare the folded template, run python init_ldr_template.py. We provide an example for sampling the low confidence region of AF entry P05231:
python mk_ldr_template.py data/AF-P05231-F1-model_v4.pdb 1-41 data/AF-P05231_ndr.npzThe provided model weights are not recommended for predicting multiple domains at the same time.
Then, to generate an IDRs with folded domains ensemble, run
mkdir P05231_build
python sample_ldr.py weights/mdl.ckpt data/AF-P05231_ndr.npz P05231_build configs/sample.yml --nconf 100 --cudaOne can set the attention_chunk to manage memory usage for long sequences (Inference on long disordered sequences may be limited by training sequence length).
This can also be run within the previously created docker image. Set the working directory to the root of the previously cloned and merged version of this repository and run the following.
mkdir P05231_build
docker run -it --rm --gpus all \
-v "./P05231_build/":/app/output \
-v "./data/":/app/data \
-v "./weights/":/app/weights \
-w /app \
idpforge:latest \
python -u /app/sample_ldr.py /app/weights/mdl.ckpt /app/data/AF-P05231_ndr.npz /app/output /app/configs/sample.yml --nconf 100 --cudaWe use UCBShift for chemical shift prediction and can be installed at https://github.com/THGLab/CSpred.git. If you wish to use X-EISD for evaluation or reweighing with experimental data, please refer to https://github.com/THGLab/X-EISDv2.
@article{zhang2025,
author = {Zhang, Oufan and Liu, Zi-Hao and Forman-Kay, Julie D. Head-Gordon, Teresa},
title = {Deep Learning of Proteins with Local and Global Regions of Disorder},
journal = {arXiv preprint},
year = {2025},
archivePrefix = {arXiv},
eprint = {2502.11326},
}