Improving Covalent and Non-Covalent Molecule Generation via Reinforcement Learning with Functional Fragments

Abstract

Small-molecule drugs play a critical role in cancer therapy by selectively targeting key signaling pathways that drive tumor growth. While deep learning models have advanced drug discovery, there remains a lack of generative frameworks for de novo covalent molecule design using a fragment-based approach. To address this, we propose MOFF (MOlecule generation with Functional Fragments), a reinforcement learning framework for molecule generation. MOFF is specifically designed to generate both covalent and non-covalent compounds based on functional fragments. The model leverages docking scores as reward function and is trained using the Soft Actor-Critic algorithm. We evaluate MOFF through case studies targeting Bruton's tyrosine kinase (BTK) and the epidermal growth factor receptor (EGFR), demonstrating that MOFF can generate ligand-like molecules with favorable docking scores and drug-like properties, compared to baseline models and ChEMBL compounds. As a computational validation, molecular dynamics (MD) simulations were conducted on selected top-scoring molecules to assess potential binding stability. These results highlight MOFF as a flexible and extensible framework for fragment-based molecule generation, with the potential to support downstream applications.

Environment Setup

Install required dependencies using the provided environment.yml file:

conda env create -f environment.yml
conda activate moff

This project also requires AutoDock-GPU for docking. Follow the AutoDock-GPU installation guide to compile it. Once built, add the binary to the ./bin directory.

Running Molecular Generation

We provide two example shell scripts for covalent and non-covalent molecule generation using reinforcement learning guided by docking scores.

Covalent Molecule Generation (Target: BTK):

Save as run_cov.sh:

export PATH="bin:$PATH"  # for docking use

CUDA_LAUNCH_BLOCKING=1 python3 run_rl.py \
    --name='c1_3267' \
    --load=0 --train=1 --has_feature=1 \
    --name_full_load='' \
    --min_action=1 --max_action=4 \
    --gnn_aggregate='sum' --gnn_type='GCN'\
    --seed=3267 --intr_rew=0 --intr_rew_ratio=5e-1  \
    --update_after=3000 --start_steps=4000 --update_every=256 --init_alpha=1. \
    --is_covalent=1 --lipinski_rew=0 \
    --desc='ecfp' \
    --rl_model='sac' \
    --active_learning='moff' \
    --gpu_id=0 --emb_size=96 --tau=.1 --batch_size=256 --target_entropy=0.1 \
    --munchausen=1 --alpha_min=0.1 --init_alpha_lr=5e-4 \
    --step_list 0 3 1 2 \
    --receptor_pdb='gym_molecule/maps_file/5p9j/5p9j.pdb' \
    --covlent_amino_acid='A:CYS:481' \
    --receptor_maps='gym_molecule/maps_file/5p9j/5p9j_rigid.maps.fld'

Non-Covalent Molecule Generation (Target: BTK)

Save as run_noncov.sh:

export PATH="bin:$PATH"  # for docking use

CUDA_LAUNCH_BLOCKING=1 python3 run_rl.py \
    --name='n1_8848' \
    --load=0 --train=1 --has_feature=1 \
    --name_full_load='' \
    --min_action=1 --max_action=4 \
    --gnn_aggregate='sum' --gnn_type='GCN'\
    --seed=8848 --intr_rew=0 --intr_rew_ratio=5e-1  \
    --update_after=3000 --start_steps=4000 --update_every=256 --init_alpha=1. \
    --is_covalent=0 --lipinski_rew=0 \
    --desc='ecfp' \
    --rl_model='sac' \
    --active_learning='moff' \
    --gpu_id=0 --emb_size=96 --tau=.1 --batch_size=256 --target_entropy=0.1 \
    --munchausen=1 --alpha_min=0.1 --init_alpha_lr=5e-4 \
    --step_list 0 2 1 2 \
    --receptor_maps='gym_molecule/maps_file/6e4f/6e4f_protein.maps.fld'

Run with EGFR Target

covalent:

--receptor_pdb='gym_molecule/maps_file/2j5f_cov/2j5f_protein.pdb' \
--covlent_amino_acid='A:CYS:797' \
--receptor_maps='gym_molecule/maps_file/2j5f_cov/2j5f_protein_rigid.maps.fld'

non-covalent:

--receptor_maps='gym_molecule/maps_file/2j5f/2j5f_protein.maps.fld'

Extendibility

Fragments used to build molecules are stored in:

./gym_molecule/dataset/*.txt

Each text file corresponds to a functional group type (e.g., warhead, linker, etc.).

You can change the build logic using:

--step_list 0 2 1 2

This represents the order in which fragment types are assembled during molecule generation. The numbers correspond to indices of your custom fragment categories.

Extend to new protein receptors:

Prepare your receptor .pdb and generate docking .maps.fld files.
Follow AutoDock-Vina or AutoDock-GPU documentation:
- Basic docking
- Covalent docking
Replace --receptor_maps, and optionally --receptor_pdb, --covlent_amino_acid in the script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Improving Covalent and Non-Covalent Molecule Generation via Reinforcement Learning with Functional Fragments

Abstract

Environment Setup

Running Molecular Generation

Covalent Molecule Generation (Target: BTK):

Non-Covalent Molecule Generation (Target: BTK)

Run with EGFR Target

Extendibility

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
gym_molecule		gym_molecule
models		models
README.md		README.md
environment.yml		environment.yml
moff.py		moff.py
run_cov.sh		run_cov.sh
run_noncov.sh		run_noncov.sh
run_rl.py		run_rl.py

HIM-AIM/MOFF

Folders and files

Latest commit

History

Repository files navigation

Improving Covalent and Non-Covalent Molecule Generation via Reinforcement Learning with Functional Fragments

Abstract

Environment Setup

Running Molecular Generation

Covalent Molecule Generation (Target: BTK):

Non-Covalent Molecule Generation (Target: BTK)

Run with EGFR Target

Extendibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages