Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
df96b58
make sure padded asym_id won't affect permutation steps
dingquanyu Feb 15, 2024
d74b09c
fixed bugs in unittests for multi-chain permutation. now working on e…
dingquanyu Feb 15, 2024
aa18a56
remove unnecessary lines
dingquanyu Feb 15, 2024
2c56566
restore to the verison on main
dingquanyu Feb 15, 2024
7df201e
added typing hints and fixed some comments
dingquanyu Feb 16, 2024
17f24bd
Added custom template folder
rostro36 Feb 20, 2024
170d9c5
make sure no padded features are going to be selected as anchors
dingquanyu Feb 20, 2024
10b6838
Fix always is_custom_template.
rostro36 Mar 1, 2024
e9bacd8
Less dependent on input sequnece, use template length instead
rostro36 Mar 2, 2024
50f8617
Support for multiple custom templates.
rostro36 Mar 2, 2024
77860bb
Improve type hints and formatting
ljarosch Mar 19, 2024
e678050
Add default shard number
ljarosch Mar 19, 2024
ee0c5db
Add duplicate chain file support to alignment DB script
ljarosch Mar 20, 2024
94819bf
Add script for expanding the alignment dir with duplicates
ljarosch Mar 20, 2024
295d0d5
Fixed documentation according to comments.
rostro36 Mar 20, 2024
e41e651
Initial commit for sphinx documentation.
jnwei Mar 20, 2024
8dfe77e
fixed typing errors; added more comments
dingquanyu Mar 21, 2024
2dbc8c0
added comments
dingquanyu Mar 21, 2024
6b4f167
Added alignment method.
rostro36 Mar 25, 2024
30813a3
Fix Colab by using OF commit from pl_upgrades
vaclavhanzl Apr 27, 2024
e2479cb
Add more efficient script to generate all-seqs FASTA
ljarosch May 6, 2024
0b5c949
Give script more descriptive name
ljarosch May 6, 2024
244970b
Slightly improve comment
ljarosch May 6, 2024
78b9706
Set CLI description to more informative module docstring
ljarosch May 6, 2024
04410d5
Improve import formatting
ljarosch May 6, 2024
a432a93
Rough draft dump of docs and readthedocs build
jnwei May 8, 2024
fd294f8
fix typo in readthedocs.yaml
jnwei May 8, 2024
6766815
replace doc environment pip dependencies with conda builds
jnwei May 8, 2024
b87946a
cleanup makefiles and original readme
jnwei May 8, 2024
087cf9f
updates to Inference.md
jnwei May 8, 2024
60cfbcc
Add addtional inference pages
jnwei May 8, 2024
24dc3af
add convert v1 weights instructions
jnwei May 8, 2024
72bb51e
Adds FAQ section
jnwei May 8, 2024
6c25555
creates link to FAQ in documentation
jnwei May 8, 2024
d22a354
small edits to main page
jnwei May 8, 2024
945ecc0
minor language edits
jnwei May 8, 2024
02493a8
Hotfix to switch order of feature dict generation
christinaflo Feb 23, 2024
cc6deaa
Added script for running decoy ranking experiments
sachinkadyan7 Feb 26, 2024
9660a43
Fix distributed seeding behavior
ljarosch Mar 19, 2024
5a7b024
Add LICENSE information to decoy ranking script
sachinkadyan7 Mar 26, 2024
b5db6c3
Update README to include details for SoloSeq Embeddings
sachinkadyan7 Mar 26, 2024
6d1ca26
Fix usage example in `download_openfold_soloseq_params.sh`
sachinkadyan7 Mar 26, 2024
453ae89
Add script to download embeddings for training SoloSeq
sachinkadyan7 Mar 26, 2024
7e76be0
Fix resolution field in mmcif_parsing
ljarosch Mar 21, 2024
60eff63
Adds mkl version to environment.yml
jnwei Apr 30, 2024
cf367df
make space for docker CI
jnwei May 6, 2024
7f8d124
Shorten README.md main page.
jnwei May 9, 2024
9a6deab
adds mmseqs2 to environment.yml for clustering
jnwei May 9, 2024
11d5fdf
Update training OpenFold docs with correct paths.
jnwei May 10, 2024
d61f585
Adds example directory
jnwei May 10, 2024
61191bf
update comments;fixed typos
dingquanyu May 10, 2024
5f78237
Update tests and comments
dingquanyu May 10, 2024
15113dc
fixed typing error of anchor_gt_residue
dingquanyu May 10, 2024
55c293c
Update test_permutation.py
jnwei May 11, 2024
9d88b8e
Merge pull request #406 from dingquanyu/update-permutation-unittest
jnwei May 11, 2024
17b8c14
make sure padded asym_id won't affect permutation steps
dingquanyu Feb 15, 2024
0df04f3
fixed bugs in unittests for multi-chain permutation. now working on e…
dingquanyu Feb 15, 2024
d1a32aa
remove unnecessary lines
dingquanyu Feb 15, 2024
77cb413
restore to the verison on main
dingquanyu Feb 15, 2024
f7571e2
added typing hints and fixed some comments
dingquanyu Feb 16, 2024
6d41838
make sure no padded features are going to be selected as anchors
dingquanyu Feb 20, 2024
04d6378
fixed typing errors; added more comments
dingquanyu Mar 21, 2024
bc24032
added comments
dingquanyu Mar 21, 2024
9a6eb64
update comments;fixed typos
dingquanyu May 10, 2024
6f1329e
Update tests and comments
dingquanyu May 10, 2024
d968098
fixed typing error of anchor_gt_residue
dingquanyu May 10, 2024
32765d2
Update test_permutation.py
jnwei May 11, 2024
f561cec
Merge branch 'docs' into setup-improvements
jnwei May 11, 2024
fcb7796
edits to inference documentation and script for parameters
jnwei May 13, 2024
6cba403
fix typo in environment.yml
jnwei May 13, 2024
29b5823
Merge pull request #419 from aqlaboratory/setup-improvements_addition…
jnwei May 13, 2024
a3c1319
Update OpenFold.ipynb to newest pl_upgrades commit
jnwei May 13, 2024
3eef7ca
Merge pull request #432 from vaclavhanzl/fix-colab-change-used-commit
jnwei May 13, 2024
6706864
Merge branch 'main' into setup-improvements
jnwei May 13, 2024
49804a5
small fix to docker-image ci workflow
jnwei May 13, 2024
6384d79
Removes mpi4py from environment.yml -- not supported for Docker builds
jnwei May 13, 2024
ffc9b3f
updates biopython version to 1.80 in colab notebook
jnwei May 13, 2024
97dae6c
Updates biopython version to 1.83 in colab notebook, which seems to w…
jnwei May 13, 2024
d8117ce
in scripts/utils.py account for case where no conda environment is sp…
jnwei May 13, 2024
f434a27
Merge pull request #439 from aqlaboratory/setup-improvements
jnwei May 13, 2024
d2ffb8d
Link OpenFold repo from the docs
vaclavhanzl May 14, 2024
feb45a5
Merge pull request #440 from vaclavhanzl/patch-1
jnwei May 14, 2024
6af8158
hotfix: Fix installation documentation
jnwei May 15, 2024
775b57e
Fix formatting in installation.md
jnwei May 17, 2024
3c1fd31
Merge pull request #443 from jnwei/main
jnwei May 17, 2024
734ebc4
Update Aux_seq_files.md
jnwei May 23, 2024
f6c875b
Merge pull request #448 from aqlaboratory/aux_seq_files_update
jnwei May 23, 2024
d3c89fc
Fix 3-to-1 letter conversion to use extended mapping
ljarosch Jul 6, 2024
f37d0d9
Merge pull request #464 from ljarosch/main
jnwei Jul 8, 2024
8b5212d
Remove unnecessary double-load of config_json.
ryan-attunely Jul 17, 2024
0dafd62
Update documentation.
rostro36 Jul 18, 2024
b38b607
Change path of Inference.md
rostro36 Jul 18, 2024
7d22739
Merge branch 'main' into main
rostro36 Jul 18, 2024
c48f850
Merge pull request #408 from rostro36/main
jnwei Jul 18, 2024
6f63267
Merge pull request #470 from rkosai/main
jnwei Jul 18, 2024
e6ce9c9
Update Installation.md - fix pl_upgrades clone instructions
vaclavhanzl Aug 24, 2024
b79ca29
Merge pull request #479 from vaclavhanzl/patch-3
jnwei Nov 7, 2024
cde001f
add environment variables
etowahadams Nov 13, 2024
8ece4f3
Merge pull request #502 from etowahadams/etowahadams/install-docs
jnwei Nov 14, 2024
625ade9
Revert "docs: Add env var instructions to install guide "
jnwei Nov 14, 2024
e605ec8
Merge pull request #504 from aqlaboratory/revert-502-etowahadams/inst…
ljarosch Nov 14, 2024
a01a60f
docs: env variable
etowahadams Nov 14, 2024
9f09442
fix: formatting
etowahadams Nov 14, 2024
6a43510
fix: formatting
etowahadams Nov 14, 2024
f1d0ae7
try quotes
etowahadams Nov 14, 2024
c05a354
fix: formatting
etowahadams Nov 14, 2024
56277ea
fix: formatting
etowahadams Nov 14, 2024
a1192c8
Merge pull request #505 from etowahadams/update-install
jnwei Dec 4, 2024
a364def
Minor typo fix in Installation.md
nenuadrian Dec 26, 2024
e8d3558
updated script
etowahadams Feb 23, 2025
26d1a5d
Merge pull request #520 from etowahadams/etowahadams/update-script
jnwei Feb 24, 2025
815a042
Merge pull request #516 from nenuadrian/patch-1
jnwei Feb 24, 2025
100a309
Maintainance to pl_upgrades
jnwei Apr 23, 2025
ab4a245
Merge branch 'main' into pl_upgrades
jnwei Apr 23, 2025
0c2d455
fix environment to support tests
jnwei Apr 23, 2025
9caf30a
Change casting for deepspeed compare model test to fp32
jnwei Apr 24, 2025
7e06ed9
support openmm>8 and fix tolerance units in amber minimization
jnwei Apr 24, 2025
cb899a5
Merge pull request #2 from aqlaboratory/pl_upgrades
jnwei Apr 24, 2025
da37880
Allow numpy>2 and support compute capability >9
jnwei Apr 24, 2025
4312aec
Add link to issue for deepspeed_evo_attention test.
jnwei Apr 25, 2025
0672517
Update installation docs to build CUDA12 version
jnwei Apr 25, 2025
16af434
fix inference documentation
jnwei Apr 25, 2025
fe10216
update version number.
jnwei Apr 25, 2025
a5433c3
Update config.py
jnwei Apr 25, 2025
620a54f
Update amber_minimize.py
jnwei Apr 25, 2025
1ffd197
Merge pull request #533 from jnwei/pl_upgrades
jnwei Apr 25, 2025
50a2e75
Update OpenFold notebook to updated pytorch2 commit
jnwei Apr 26, 2025
c587b06
add Open In Colab banner to notebook.
jnwei Apr 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docker-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Cleanup
- name: Cleanup # https://github.com/actions/virtual-environments/issues/2840
run: sudo rm -rf /usr/share/dotnet && sudo rm -rf /opt/ghc && sudo rm -rf "/usr/local/share/boost" && sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Build the Docker image
run: docker build . --file Dockerfile --tag openfold:$(date +%s)
6 changes: 3 additions & 3 deletions docs/source/Aux_seq_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,9 @@ All together, the file directory would look like:
└── 6kwc.cif
└── alignment_db
├── alignment_db_0.db
├── alignment_db_1.db
...
├── alignment_db_9.db
├── alignment_db_1.db
...
├── alignment_db_9.db
└── alignment_db.index
```

Expand Down
17 changes: 9 additions & 8 deletions docs/source/Inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ $ bash scripts/download_openfold_params.sh $PARAMS_DIR

We recommend selecting `openfold/resources` as the params directory as this is the default directory used by the `run_pretrained_openfold.py` to locate parameters.

If you choose to use a different directory, you may make a symlink to the `openfold/resources` directory, or specify an alternate parameter path with the command line argument `--jax_path` for AlphaFold parameters or `--openfold_checkpoint_path` for OpenFold parameters.
If you choose to use a different directory, you may make a symlink to the `openfold/resources` directory, or specify an alternate parameter path with the command line argument `--jax_param_path` for AlphaFold parameters or `--openfold_checkpoint_path` for OpenFold parameters.


### Model Inference
Expand All @@ -62,7 +62,7 @@ python3 run_pretrained_openfold.py \
$TEMPLATE_MMCIF_DIR
--output_dir $OUTPUT_DIR \
--config_preset model_1_ptm \
--uniref90_database_path $BASE_DATA_DIR/uniref90 \
--uniref90_database_path $BASE_DATA_DIR/uniref90/uniref90.fasta \
--mgnify_database_path $BASE_DATA_DIR/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path $BASE_DATA_DIR/pdb70 \
--uniclust30_database_path $BASE_DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
Expand Down Expand Up @@ -138,6 +138,7 @@ Some commonly used command line flags are here. A full list of flags can be view
- `--data_random_seed`: Specifies a random seed to use.
- `--save_outputs`: Saves a copy of all outputs from the model, e.g. the output of the msa track, ptm heads.
- `--experiment_config_json`: Specify configuration settings using a json file. For example, passing a json with `{globals.relax.max_iterations = 10}` specifies 10 as the maximum number of relaxation iterations. See for [`openfold/config.py`](https://github.com/aqlaboratory/openfold/blob/main/openfold/config.py#L283) the full dictionary of configuration settings. Any parameters that are not manually set in these configuration settings will refer to the defaults specified by your `config_preset`.
- `--use_custom_template`: Uses all .cif files in `template_mmcif_dir` as template input. Make sure the chains of interest have the identifier _A_ and have the same length as the input sequence. The same templates will be read for all sequences that are passed for inference.


### Advanced Options for Increasing Efficiency
Expand All @@ -159,12 +160,12 @@ Note that chunking (as defined in section 1.11.8 of the AlphaFold 2 supplement)
#### Long sequence inference
To minimize memory usage during inference on long sequences, consider the following changes:

- As noted in the AlphaFold-Multimer paper, the AlphaFold/OpenFold template stack is a major memory bottleneck for inference on long sequences. OpenFold supports two mutually exclusive inference modes to address this issue. One, `average_templates` in the `template` section of the config, is similar to the solution offered by AlphaFold-Multimer, which is simply to average individual template representations. Our version is modified slightly to accommodate weights trained using the standard template algorithm. Using said weights, we notice no significant difference in performance between our averaged template embeddings and the standard ones. The second, `offload_templates`, temporarily offloads individual template embeddings into CPU memory. The former is an approximation while the latter is slightly slower; both are memory-efficient and allow the model to utilize arbitrarily many templates across sequence lengths. Both are disabled by default, and it is up to the user to determine which best suits their needs, if either.
- Inference-time low-memory attention (LMA) can be enabled in the model config. This setting trades off speed for vastly improved memory usage. By default, LMA is run with query and key chunk sizes of 1024 and 4096, respectively. These represent a favorable tradeoff in most memory-constrained cases. Powerusers can choose to tweak these settings in `openfold/model/primitives.py`. For more information on the LMA algorithm, see the aforementioned Staats & Rabe preprint.
- Disable `tune_chunk_size` for long sequences. Past a certain point, it only wastes time.
- As a last resort, consider enabling `offload_inference`. This enables more extensive CPU offloading at various bottlenecks throughout the model.
- As noted in the AlphaFold-Multimer paper, the AlphaFold/OpenFold template stack is a major memory bottleneck for inference on long sequences. OpenFold supports two mutually exclusive inference modes to address this issue. One, `average_templates` in the `template` section of the config, is similar to the solution offered by AlphaFold-Multimer, which is simply to average individual template representations. Our version is modified slightly to accommodate weights trained using the standard template algorithm. Using said weights, we notice no significant difference in performance between our averaged template embeddings and the standard ones. The second, `offload_templates`, temporarily offloads individual template embeddings into CPU memory. The former is an approximation while the latter is slightly slower; both are memory-efficient and allow the model to utilize arbitrarily many templates across sequence lengths. Both are disabled by default, and it is up to the user to determine which best suits their needs, if either.
- Inference-time low-memory attention (LMA) can be enabled in the model config. This setting trades off speed for vastly improved memory usage. By default, LMA is run with query and key chunk sizes of 1024 and 4096, respectively. These represent a favorable tradeoff in most memory-constrained cases. Powerusers can choose to tweak these settings in `openfold/model/primitives.py`. For more information on the LMA algorithm, see the aforementioned Staats & Rabe preprint.
- Disable `tune_chunk_size` for long sequences. Past a certain point, it only wastes time.
- As a last resort, consider enabling `offload_inference`. This enables more extensive CPU offloading at various bottlenecks throughout the model.
- Disable FlashAttention, which seems unstable on long sequences.

Using the most conservative settings, we were able to run inference on a 4600-residue complex with a single A100. Compared to AlphaFold's own memory offloading mode, ours is considerably faster; the same complex takes the more efficent AlphaFold-Multimer more than double the time. Use the `long_sequence_inference` config option to enable all of these interventions at once. The `run_pretrained_openfold.py` script can enable this config option with the `--long_sequence_inference` command line option
Using the most conservative settings, we were able to run inference on a 4600-residue complex with a single A100. Compared to AlphaFold's own memory offloading mode, ours is considerably faster; the same complex takes the more efficent AlphaFold-Multimer more than double the time. Use the `long_sequence_inference` config option to enable all of these interventions at once. The `run_pretrained_openfold.py` script can enable this config option with the `--long_sequence_inference` command line option

Input FASTA files containing multiple sequences are treated as complexes. In this case, the inference script runs AlphaFold-Gap, a hack proposed [here](https://twitter.com/minkbaek/status/1417538291709071362?lang=en), using the specified stock AlphaFold/OpenFold parameters (NOT AlphaFold-Multimer).
Input FASTA files containing multiple sequences are treated as complexes. In this case, the inference script runs AlphaFold-Gap, a hack proposed [here](https://twitter.com/minkbaek/status/1417538291709071362?lang=en), using the specified stock AlphaFold/OpenFold parameters (NOT AlphaFold-Multimer).
26 changes: 16 additions & 10 deletions docs/source/installation.md → docs/source/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ In this guide, we will OpenFold and its dependencies.

**Pre-requisites**

This package is currently supported for CUDA 11 and Pytorch 1.12. All dependencies are listed in the [`environment.yml`](https://github.com/aqlaboratory/openfold/blob/main/environment.yml)
This package is currently supported for CUDA 12 and Pytorch 2. All dependencies are listed in the [`environment.yml`](https://github.com/aqlaboratory/openfold/blob/main/environment.yml).

At this time, only Linux systems are supported.

Expand All @@ -19,10 +19,17 @@ At this time, only Linux systems are supported.
Mamba is recommended as the dependencies required by OpenFold are quite large and mamba can speed up the process.
- Activate the environment, e.g `conda activate openfold_env`
1. Run the setup script to configure kernels and folding resources.
> scripts/install_third_party_dependencies.sh`
3. Prepend the conda environment to the $LD_LIBRARY_PATH., e.g.
`export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH``. You may optionally set this as a conda environment variable according to the [conda docs](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#saving-environment-variables) to activate each time the environment is used.
4. Download parameters. We recommend using a destination as `openfold/resources` as our unittests will look for the weights there.
> scripts/install_third_party_dependencies.sh
1. Prepend the conda environment to the `$LD_LIBRARY_PATH` and `$LIBRARY_PATH`., e.g.

```
export LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
```

You may optionally set this as a conda environment variable according to the [conda docs](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#saving-environment-variables) to activate each time the environment is used.

1. Download parameters. We recommend using a destination as `openfold/resources` as our unittests will look for the weights there.
- For AlphaFold2 weights, use
> ./scripts/download_alphafold_params.sh <dest>
- For OpenFold weights, use :
Expand All @@ -46,10 +53,9 @@ Certain tests perform equivalence comparisons with the AlphaFold implementation.

## Environment specific modifications

### CUDA 12
To use OpenFold on CUDA 12 environment rather than a CUDA 11 environment.
In step 1, use the branch [`pl_upgrades`](https://github.com/aqlaboratory/openfold/tree/pl_upgrades) rather than the main branch, i.e. replace the URL in step 1 with https://github.com/aqlaboratory/openfold/tree/pl_upgrades
Follow the rest of the steps of [Installation Guide](#Installation)
### MPI
To use OpenFold with MPI support, you will need to add the package [`mpi4py`](https://pypi.org/project/mpi4py/). This can be done with pip in your OpenFold environment, e.g. `$ pip install mpi4py`.


### Install OpenFold parameters without aws
If you don't have access to `aws` on your system, you can use a different download source:
Expand All @@ -59,4 +65,4 @@ If you don't have access to `aws` on your system, you can use a different downlo

### Docker setup

A [`Dockerfile`] is provided to build an OpenFold Docker image. Additional notes for setting up a docker container for OpenFold and running inference can be found [here](original_readme.md#building-and-using-the-docker-container).
A [`Dockerfile`](https://github.com/aqlaboratory/openfold/blob/main/Dockerfile) is provided to build an OpenFold Docker image. Additional notes for setting up a docker container for OpenFold and running inference can be found [here](original_readme.md#building-and-using-the-docker-container).
9 changes: 4 additions & 5 deletions docs/source/Multimer_Inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,7 @@ python3 run_pretrained_openfold.py \
--output_dir ./
```

Note that template searching in the multimer pipeline
uses HMMSearch with the PDB SeqRes database, replacing HHSearch and PDB70 used in the monomer pipeline.

As with monomer inference, if you've already computed alignments for the query, you can use
the `--use_precomputed_alignments` option.
**Notes:**
- Template searching in the multimer pipeline uses HMMSearch with the PDB SeqRes database, replacing HHSearch and PDB70 used in the monomer pipeline.
- As with monomer inference, if you've already computed alignments for the query, you can use the `--use_precomputed_alignments` option.
- At this time, only AlphaFold parameter weights are available for multimer mode.
4 changes: 2 additions & 2 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
:align: center
:alt: Comparison of OpenFold and AlphaFold2 predictions to the experimental structure of PDB 7KDX, chain B._
```
Welcome to the Documentation for OpenFold, the fully open source, trainable, PyTorch-based reproduction of DeepMind's
Welcome to the Documentation for [OpenFold](https://github.com/aqlaboratory/openfold), the fully open source, trainable, PyTorch-based reproduction of DeepMind's
[AlphaFold 2](https://github.com/deepmind/alphafold).

Here, you will find guides and documentation for:
Expand Down Expand Up @@ -115,4 +115,4 @@ Aux_seq_files.md
OpenFold_Parameters.md
FAQ.md
original_readme.md
```
```
17 changes: 8 additions & 9 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,34 +8,33 @@ dependencies:
- cuda
- gcc=12.4
- python=3.10
- libgcc=7.2
- setuptools=59.5.0
- pip
- openmm=7.7
- openmm
- pdbfixer
- pytorch-lightning
- biopython
- numpy<2.0.0
- numpy
- pandas
- PyYAML==5.4.1
- PyYAML
- requests
- scipy
- tqdm==4.62.2
- tqdm
- typing-extensions
- wandb
- modelcif==0.7
- awscli
- ml-collections
- mkl=2022.1
- aria2
- mkl
- git
- bioconda::hmmer
- bioconda::hhsuite
- bioconda::kalign2
- pytorch::pytorch=2.1
- pytorch::pytorch-cuda=12.1
- pytorch::pytorch=2.5
- pytorch::pytorch-cuda=12.4
- pip:
- deepspeed==0.12.4
- deepspeed==0.14.5
- dm-tree==0.1.6
- git+https://github.com/NVIDIA/dllogger.git
- flash-attn
25 changes: 17 additions & 8 deletions notebooks/OpenFold.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/aqlaboratory/OpenFold/blob/main/notebooks/OpenFold.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -107,11 +117,11 @@
"\n",
"python_version = f\"{version_info.major}.{version_info.minor}\"\n",
"\n",
"\n",
"os.system(\"wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh\")\n",
"os.system(\"bash Mambaforge-Linux-x86_64.sh -bfp /usr/local\")\n",
"os.system(\"wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh\")\n",
"os.system(\"bash Miniforge3-Linux-x86_64.sh -bfp /usr/local\")\n",
"os.environ[\"PATH\"] = \"/usr/local/bin:\" + os.environ[\"PATH\"]\n",
"os.system(\"mamba config --set auto_update_conda false\")\n",
"os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=7.7.0 python={python_version} pdbfixer biopython=1.79\")\n",
"os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=8.2.0 python={python_version} pdbfixer biopython=1.83\")\n",
"os.system(\"pip install -q torch ml_collections py3Dmol modelcif\")\n",
"\n",
"try:\n",
Expand All @@ -127,7 +137,7 @@
"\n",
" %shell mkdir -p /content/openfold/openfold/resources\n",
"\n",
" commit = \"a96ffd67f8c96f8c4decc3abdd2cffbb57fc5764\"\n",
" commit = \"1ffd197489aa5f35a5fbce1f00d7dd49bce1bd2f\"\n",
" os.system(f\"pip install -q git+https://github.com/aqlaboratory/openfold.git@{commit}\")\n",
"\n",
" os.system(f\"cp -f -p /content/stereo_chemical_props.txt /usr/local/lib/python{python_version}/site-packages/openfold/resources/\")\n",
Expand Down Expand Up @@ -893,8 +903,7 @@
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4",
"toc_visible": true
"gpuType": "T4"
},
"kernelspec": {
"display_name": "Python 3",
Expand All @@ -907,4 +916,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}
2 changes: 1 addition & 1 deletion openfold/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -660,7 +660,7 @@ def model_config(
},
"relax": {
"max_iterations": 0, # no max
"tolerance": 2.39,
"tolerance": 10.0,
"stiffness": 10.0,
"max_outer_iterations": 20,
"exclude_residues": [],
Expand Down
19 changes: 16 additions & 3 deletions openfold/data/data_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,19 @@
from typing import Mapping, Optional, Sequence, Any, MutableMapping, Union
import numpy as np
import torch
from openfold.data import templates, parsers, mmcif_parsing, msa_identifiers, msa_pairing, feature_processing_multimer
from openfold.data.templates import get_custom_template_features, empty_template_feats
from openfold.data import (
templates,
parsers,
mmcif_parsing,
msa_identifiers,
msa_pairing,
feature_processing_multimer,
)
from openfold.data.templates import (
get_custom_template_features,
empty_template_feats,
CustomHitFeaturizer,
)
from openfold.data.tools import jackhmmer, hhblits, hhsearch, hmmsearch
from openfold.np import residue_constants, protein

Expand All @@ -38,7 +49,9 @@ def make_template_features(
template_featurizer: Any,
) -> FeatureDict:
hits_cat = sum(hits.values(), [])
if(len(hits_cat) == 0 or template_featurizer is None):
if template_featurizer is None or (
len(hits_cat) == 0 and not isinstance(template_featurizer, CustomHitFeaturizer)
):
template_features = empty_template_feats(len(input_sequence))
else:
templates_result = template_featurizer.get_templates(
Expand Down
2 changes: 1 addition & 1 deletion openfold/data/mmcif_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ def parse(
author_chain = mmcif_to_author_chain_id[chain_id]
seq = []
for monomer in seq_info:
code = PDBData.protein_letters_3to1.get(monomer.id, "X")
code = PDBData.protein_letters_3to1_extended.get(monomer.id, "X")
seq.append(code if len(code) == 1 else "X")
seq = "".join(seq)
author_chain_to_sequence[author_chain] = seq
Expand Down
Loading