✨ SMRTnet ✨

This is a PyTorch implementation of our study:

🎯 SMRTnet: Predicting small molecule and RNA target interactions using deep neural networks

Small molecules can bind RNAs to regulate their fate and functions, providing promising opportunities for treating human diseases. However, current tools for predicting small molecule-RNA interactions (SRIs) require prior knowledge of RNA tertiary structures, limiting their utility in drug discovery. Here, we present SMRTnet, a deep learning method to predict SRIs based on RNA secondary structure. By integrating two large language models, convolutional neural networks, graph attention networks, and an attention-based multimodal data fusion model, SMRTnet achieves high performance across multiple experimental benchmarks, substantially outperforming existing state-of-the-art tools.

For wet-lab validation, we conducted a large-scale experimental assessment on SMRTnet predictions for 10 disease-associated RNA targets (e.g. mRNA of undruggable proteins, onco-miRNAs, viral RNAs, and RNA repeat expansions), identifying 40 hits of RNA-targeting small molecules with nanomolar-to-micromolar dissociation constants using microscale thermophoresis (MST). Focusing on the MYC internal ribosome entry site (IRES) as a target, SMRTnet-predicted small molecules showed binding scores correlated closely with observed validation rates. Notably, one predicted compound downregulated MYC expression, inhibited proliferation, and promoted apoptosis in three cancer cell lines.

Taken together, SMRTnet expands the scope of feasible RNA targets and accelerates the discovery and development of RNA-targeting therapeutics.

Overview of SMRTnet

📍 Cite us

If you use this tool in your research, we kindly ask that you cite our paper:

Title: Predicting small molecule–RNA interactions without RNA tertiary structures

Author: Fei Y, Wang P, Zhang J, Shan X, Cai Z, Ma J, Wang Y, and Zhang QC

Journal: Nature Biotechnology, 2026 (5-year Journal Impact Factor: 59.5)

Paper ink: https://www.nature.com/articles/s41587-025-02942-z

☎️ Contact us

Please contact us if you are interested in our work or potential academic collaborations.

Dr. Yuhan Fei, School of Life Sciences, Tsinghua University, Posdoc, yuhan_fei@outlook.com
Jiasheng Zhang, School of Life Sciences, Tsinghua University, PhD student, zjs21@mails.tsinghua.edu.cn

📖 Table of contents

1️⃣ Getting Started
- Install via PyPI
- Run SMRTnet via Google Colab
2️⃣ Download pre-trained models
3️⃣ Repo Structure
4️⃣ Datasets
5️⃣ Usage
- How to train your own model
- How to test model performance
- How to perform interence for novel interactions
- How to benchmark on known interactions
- How to identify potential binding sites
6️⃣ Referenced Repos
7️⃣ Copyright and License
8️⃣ Patent Declaration
9️⃣ Disclaimer

Getting started

Please run the following command to check your CUDA version before installing SMRTnet:

nvidia-smi

or

nvcc --version

❗ Note: To install Torch and DGL versions compatible with your CUDA setup, please refer to the following URLs:

📌 Install via PyPI

1) The Stable version for installation (Recommend)

conda create -n smrtnet python=3.8.10
conda activate smrtnet
pip install torch==2.4.1+cu118 torchvision==0.19.1+cu118 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118
pip install smrtnet
conda install dglteam/label/th24_cu118::dgl

The stable version of SMRTnet environment is also available on Zenodo (https://zenodo.org/records/14970392) for offline installation.

2) The Latest version for installation

conda create -n smrtnet_latest python=3.8.10
conda activate smrtnet_latest
pip install torch torchvision
pip install smrtnet-latest
conda install dglteam/label/th24_cu121::dgl

❗ Note: The table now explicitly details these two installation pathways and provides complete, version-specific dependency lists for reference:

Package	Stable version	Latest version	Remarks
babel	2.17.0	2.17.0	Up to date
charset-normalizer	3.3.2	3.3.2	Required
dgllife	0.3.2	0.3.2	Up to date
dgl	2.4.0.th24.cu118	2.4.0.th24.cu121	Up to date
matplotlib	3.7.5	3.7.5	Constrained by dependencies
networkx	2.8.8	3.1	Constrained by dependencies
huggingface-hub	0.29.1	0.34.4	Up to date
notebook	7.3.2	7.3.3	Constrained by dependencies
numpy	1.20.3	1.24.4	Constrained by dependencies
pandas	1.2.4	2.0.3	Constrained by dependencies
prefetch_generator	1.0.3	1.0.3	Up to date
prettytable	3.11.0	3.11.0	Constrained by dependencies
pytorch-lightning	1.1.5	2.4.0	Constrained by dependencies
python	3.8.10	3.8.10	Required
rdkit	2022.3.5	2022.3.5	Required
scikit-learn	0.24.2	1.3.2	Constrained by dependencies
scipy	1.10.1	1.10.1	Constrained by dependencies
seaborn	0.13.2	0.13.2	Up to date
tensorboard	2.14.0	2.14.0	Constrained by dependencies
tensorboardX	2.6.2.2	2.6.2.2	Constrained by dependencies
torch	2.4.1+cu118	2.4.1	Constrained by dependencies
tqdm	4.67.1	4.67.1	Up to date
transformers	4.28.1	4.28.1	Required
xsmiles	0.2.2	0.2.2	Up to date

‘Required’ denotes that SMRTnet requires this specific version of the indicated package for proper operation;
‘Up to date’ indicates that the dependency is at the latest version of the indicated package;
‘Constrained by dependencies’ explains that, although a newer version is available, compatibility with other dependencies limits the update.

❗ Note: We conducted usability tests of both installation methods with a diverse group of users to validate the setup process. The hardware and software details are listed below:

GPUs	Driver version	CUDA version	Stable version	Latest version
H20 (96G)	570.158.01	12.8	✅	✅
RTX 4090 (24G)	570.124.06	12.8	✅	✅
RTX 4090 (24G)	550.135	12.4	✅	✅
RTX 2080 (11G)	535.216.03	12.2	✅	✅
A100 40G (40G)	560.35.03	12.6	✅	✅
A800 80G (80G)	450.248.02	11.0	✅	✅

📌 Run SMRTnet via Google Colab

We have developed an online jupyter-notebook that allows installation-free execution of SMRTnet directly in a web browser via Google Colab. This solution supports both inference and interpretability functionalities while eliminating system-specific installation issues with limited GPU resources.

Step 1: Please click the followting link: https://drive.google.com/drive/folders/1HQo3o2saY5U9vPqebz4ZdpCVVQXqw0q_?usp=sharing, and copy the shared folder to your own Google Drive by dragging it into your Drive interface:

Step 2: Please follow the step-by-step instructions provided in the SMRTnet.ipynb notebook to run SMRTnet directly: https://colab.research.google.com/drive/1pm5ZCD8cFRvPA9RPvtEaCHoU1p5X5v4Y?usp=sharing

‼️ If you encounter any issues during the installation process, please feel free to report the problem in the 'Issues module' or contact us directly via email at yuhan_fei@outlook.com or zjs21@mails.tsinghua.edu.cn.

Download pre-trained models from Zenodo

The architecture of SMRTnet

Since the pre-trained models used in SMRTnet are large, we have uploaded them to Zenodo for direct download. Users are required to download the pre-trained models, including the RNA language model (RNASwan-seq), the chemical language model (MoLFormer), and the SMRTnet model, from the link below and place them in the SMRTnet folder (see the “Repo Structure” section below for details).

Pre-trained models used in SMRTnet can be downloaded from Zenodo: https://zenodo.org/records/14715564, and place them into the SMRTnet directory.

Alternatively, we provided the command lines for users to download the pre-trained models:

#Step 1: Download SMRTnet to your device and ensure that your current working directory is the `SMRTnet` folder:
git clone https://github.com/Yuhan-Fei/SMRTnet.git

# Step 2: Download and unzip the pre-trained chemical language model (MoLFormer)
wget https://zenodo.org/records/14715564/files/LM_Mol.zip
unzip LM_Mol.zip

# Step 3: Download and unzip the pre-trained RNA language model (RNASwan-seq)
wget https://zenodo.org/records/14715564/files/LM_RNA.zip
unzip LM_RNA.zip

# Step 4: Download and unzip the SMRTnet model into the `results` folder within the SMRTnet directory
wget https://zenodo.org/records/14715564/files/SMRTnet_model.zip
unzip SMRTnet_model.zip -d ./results

💬 The details of each module in SMRTnet

Click here to view the architecture of the RNA language model (RNASwan-seq)

Click here to view the architecture of the two-layer convolution block

Click here to view the architecture of the chemical language model (MoLFormer)

Figure adapted from the MoLFormer publication (https://www.nature.com/articles/s42256-022-00580-7).

Click here to view the architecture of the three-layer graph attention block

Click here to view the architecture of the multimodal data fusion block

Repo Structure:

After downloading all our data, the repo has the following structure:

├── LM_Mol (download from zenodo)
|
├── LM_RNA (download from zenodo)
|
├── results
|   └── SMRTNet_model (download from zenodo)
|          └── SMRTnet_cv1.pth
|          └── SMRTnet_cv2.pth
|          └── SMRTnet_cv3.pth
|          └── SMRTnet_cv4.pth
|          └── SMRTnet_cv5.pth
|          └── config.pkl
|
├── data
|   └── SMRTnet_data.txt
|   └── SMRTnet_benchmark.txt
|   └── SMRTnet_benchmark_NALDB.txt
|   └── SMRTnet_benchmark_SMMRNA.txt
|   └── SMRTnet_benchmark_RSIM.txt
|   └── SMRTnet_benchmark_RBIND.txt
|   └── SMRTnet_benchmark_NewPub.txt
|   └── MYC_RIBOTAC.txt
|   └── MYC_IRES.txt
|   └── natural_compounds.txt
|
├── dataset_cv_best
|   └── test_CV1.txt
|   └── test_CV2.txt
|   └── test_CV3.txt
|   └── test_CV4.txt
|   └── test_CV5.txt
|   └── ...
| 
├── img_log
|
├── LISENCE
├── README.md
├── SMRTnet.ipynb
├── explain.py
├── infer.py
├── inference.py
├── interpret.ipynb
├── loader.py
├── loop.py
├── main.py
├── mergeCV.py
├── model.py
├── requirements.txt
├── utils.py
└── visual.py

Datasets

📌 Datasets for training and test:

The training data for SMRTnet is available in the data folder: SMRTnet_data.txt
The test data for SMRTnet based on ligand-based splitting strategy is available in the dataset_cv_best folder: test_CV1.txt, test_CV2.txt, test_CV3.txt, test_CV4.txt, and test_CV5.txt

The data of SMRTnet is sourced from https://www.rcsb.org/
The raw PDB structural data used in SMRTnet can be downloaded from https://zenodo.org/records/14986116

The format of data for training is show as follow, the length of RNA sequence and its structure is 31-nt:

SMILES	Sequence	Structure	label
CC1=CC2=C(CC1)C(=CC3=C2C(=CO3)C)C	GGGGGGGCUUCGCCUCUGGCCCAGCCCUCCC	(((((((((..(((...)))..)))))))))	1
CC1=CC(=O)OC2=C1C=CC(=C2)O	GAUGUUGACUGUUGAAUCUCAUGGCAACACC	(.(((((.((((.(.....)))))))))).)	0
...	...	...	...

Users can use do_train or do_test to run the data.

📌 Datasets for inference:

The demo data for SMRTnet inference is available in the data folder: MYC_IRES.txt and MYC_RIBOTAC.txt

The length of RNA should ≥31nt, and the sequence length should equal to the structure length. Data are split by tab and ignore the first header row.

RNA	Sequence	Structure
MYC_IRES	GUGGGGGCUUCGCCUCUGGCCCAGCCCUCAC	(((((((((..(((...)))..)))))))))

The SMILES of small molecule should meet the requirement of RDkit.Data are split by tab and ignore the first header row.

CAS	SMILES
3902-71-4	CC1=CC(=O)OC2=C1C=C3C=C(OC3=C2C)C
149-91-7	C1=C(C=C(C(=C1O)O)O)C(=O)O
132201-33-3	C1=CC=C(C=C1)C(C(C(=O)O)O)NC(=O)C2=CC=CC=C2
...	...

Users can use do_ensemble or do_infer to run the data.

Additionally, we released a curated library consisting of 7,350 compounds of natural products and metabolites for drug screening. Specifically, we integrated five natural product libraries from the in-house chemical library of the Center of Pharmaceutical Technology, Tsinghua University (http://cpt.tsinghua.edu.cn/hts/), including the Natural Product Library for HTS , the BBP Natural Product Library , the TargetMol Natural Compound Library , the MCE Natural Product Library , and the Pharmacodia Natural Product Library. This library is available in the data folder: natural_compounds.txt

📌 Datasets for benchmarking:

All benchmark datasets for SMRTnet is available in the data folder: SMRTnet_benchmark.txt

We also divided the SMRTnet-benchmark dataset to 5 subsets corresponding their source databases:

R-BIND (https://rbind.chem.duke.edu/), SMRTnet_benchmark_RBIND.txt
R-SIM (https://web.iitm.ac.in/bioinfo2/R_SIM/), SMRTnet_benchmark_RSIM.txt
SMMRNA (http://www.smmrna.org/), SMRTnet_benchmark_SMMRNA.txt
NALDB (http://bsbe.iiti.ac.in/bsbe/naldb/HOME.php), SMRTnet_benchmark_NALDB.txt
NewPub (https://pubmed.ncbi.nlm.nih.gov/), SMRTnet_benchmark_NewPub.txt

The format of data for benchmarking is show as follow, the length of RNA sequence and its structure should ≥31-nt :

Index	SMILES	Sequence	Structure	label
1	C1=NC2=NC(=NC(=C2N1)N)N	GGACAUAUAAUCGCGUGGAUAUGGCACGCAAGUUUCUACCGGGCACCGUAAAUGUCCGAUUAUGUCC	(((((((((..(((...)))..)))))))))	1
2	c12c(ncnc1N)[nH]cn2	GGACAUAUAAUCGCGUGGAUAUGGCACGCAAGUUUCUACCGGGCACCGUAAAUGUCCGAUUAUGUCC	(.(((((.((((.(.....)))))))))).)	0
...	...	...	...	...

Users can use do_benchmark to run the data.

Usage

The training of SMRTnet requires ~14G of GPU memory (with batch_size = 32) and takes ~48 hours to complete training using the SMRTnet-data dataset with 5-fold corss-validation.
The inference of SMRTnet requires ~4G of GPU memory (with batch_size = 1) and takes ~25 seconds to predict binding score of a small molecule-RNA pair using the ensemble scoring strategy on a single GPU.

♥️ Train your model from scratch

You can run the training using:

python main.py --do_train

We provide the example scripts to train the model from scratch:

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

python main.py --do_train \
               --data_dir=./data/SMRTnet_data.txt \
               --cuda 0 \
               --batch_size 32 \
               --out_dir=./results/demo \
               --lm_rna_config ./LM_RNA/parameters.json \
	           --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
	           --lm_mol_config ./LM_Mol/bert_vocab.txt  \
	           --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

♠️ Evaluate performance on the test set

You can run the evaluation using:

python main.py --do_test

We provide the example scripts to test the model:

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

DIR=./results/SMRTnet_model

python main.py --do_test \
               --data_dir=./dataset_cv_best/test_CV1.txt \
               --infer_config_dir ${DIR}/config.pkl \
               --infer_model_dir ${DIR}/SMRTnet_cv1.pth \
               --cuda 0 \
               --batch_size 1 \
               --out_dir=./results/test \
               --lm_rna_config ./LM_RNA/parameters.json \
	           --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
	           --lm_mol_config ./LM_Mol/bert_vocab.txt  \
	           --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

Note: This case represents the results of the model from the 1-fold CV (SMRTnet_cv1.pth).
SMRTnet uses an ensemble scoring strategy to make prediction based on the 5 models from 5-fold cross-validation, the infer_model_dir parameter needs to be modified to SMRTnet_cv2.pth, SMRTnet_cv3.pth, SMRTnet_cv4.pth, and SMRTnet_cv5.pth, respectively.

For example, the performance of the SMRTnet_cv2 model was evaluated on a test set of 2-fold CV:

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

DIR=./results/SMRTnet_model

python main.py --do_test \
               --data_dir=./dataset_cv_best/test_CV2.txt \
               --infer_config_dir ${DIR}/config.pkl \
               --infer_model_dir ${DIR}/SMRTnet_cv2.pth \
               --cuda 0 \
               --batch_size 1 \
               --out_dir=./results/test \
               --lm_rna_config ./LM_RNA/parameters.json \
	           --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
	           --lm_mol_config ./LM_Mol/bert_vocab.txt  \
	           --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

♦️ Make predictions for novel interactions

SMRTnet uses an ensemble scoring strategy to make prediction based on the 5 models from 5-fold cross-validation

You can perform inference using two approaches. The difference between them lies in whether multiple GPUs are used.

python main.py --do_ensemble

or

python main.py --do_infer

1. The ensemble scoring strategy: We provide the example scripts to perform inference with a single GPU:

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

DIR=./results/SMRTnet_model

python main.py --do_ensemble \
               --cuda 0 \
               --infer_config_dir ${DIR}/config.pkl \
               --infer_model_dir ${DIR} \
               --infer_out_dir ./data/ensemble \
               --infer_rna_dir ./data/MYC_IRES.txt \
               --infer_drug_dir ./data/MYC_RIBOTAC.txt \
               --lm_rna_config ./LM_RNA/parameters.json \
	           --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
	           --lm_mol_config ./LM_Mol/bert_vocab.txt  \
	           --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

1. The parallel ensemble scoring strategy: We also provide the example scripts to accelerate inference with multipe GPUs simultaneously:

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

DIR=./results/SMRTnet_model

#1. To run the 1-fold cross-validation model on GPU No. 1, use the following command:
CV=1
mkdir -p ./results/MYC_with_RiboTac/CV_1
python main.py --do_infer \
               --cuda 1 \
               --infer_config_dir ${DIR}/config.pkl \
			   --infer_model_dir ${DIR}/SMRTnet_cv1.pth \
    		   --infer_out_dir ./results/MYC_with_RiboTac/CV_1/results.txt \
    	       --infer_rna_dir ./data/MYC_IRES.txt \
    		   --infer_drug_dir ./data/MYC_RIBOTAC.txt \
    		   --lm_rna_config ./LM_RNA/parameters.json \
    		   --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
    		   --lm_mol_config ./LM_Mol/bert_vocab.txt  \
    		   --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

#2. To run the 2-fold cross-validation model on GPU No. 2, use the following command:
CV=2
mkdir -p ./results/MYC_with_RiboTac/CV_2
python main.py --do_infer \
               --cuda 2 \
    		   --infer_config_dir ${DIR}/config.pkl \
    		   --infer_model_dir ${DIR}/SMRTnet_cv2.pth \
    		   --infer_out_dir ./results/MYC_with_RiboTac/CV_2/results.txt \
    		   --infer_rna_dir ./data/MYC_IRES.txt \
    		   --infer_drug_dir ./data/MYC_RIBOTAC.txt \
    		   --lm_rna_config ./LM_RNA/parameters.json \
    		   --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
    		   --lm_mol_config ./LM_Mol/bert_vocab.txt  \
    		   --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

#3. To run the 3-fold cross-validation model on GPU No. 3, use the following command:
CV=3
mkdir -p ./results/MYC_with_RiboTac/CV_3
python main.py --do_infer \
			   --cuda 3 \
    		   --infer_config_dir ${DIR}/config.pkl \
    		   --infer_model_dir ${DIR}/SMRTnet_cv3.pth \
    		   --infer_out_dir ./results/MYC_with_RiboTac/CV_3/results.txt \
    		   --infer_rna_dir ./data/MYC_IRES.txt \
    		   --infer_drug_dir ./data/MYC_RIBOTAC.txt \
    		   --lm_rna_config ./LM_RNA/parameters.json \
    		   --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
    		   --lm_mol_config ./LM_Mol/bert_vocab.txt  \
    		   --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

#4. To run the 4-fold cross-validation model on GPU No. 4, use the following command:
CV=4
mkdir -p ./results/MYC_with_RiboTac/CV_4
python main.py --do_infer \
               --cuda 4 \
               --infer_config_dir ${DIR}/config.pkl \
    		   --infer_model_dir ${DIR}/SMRTnet_cv4.pth \
    		   --infer_out_dir ./results/MYC_with_RiboTac/CV_4/results.txt \
    		   --infer_rna_dir ./data/MYC_IRES.txt \
    		   --infer_drug_dir ./data/MYC_RIBOTAC.txt \
    		   --lm_rna_config ./LM_RNA/parameters.json \
    		   --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
    		   --lm_mol_config ./LM_Mol/bert_vocab.txt  \
    		   --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

#5. To run the 5-fold cross-validation model on GPU No. 5, use the following command:
CV=5
mkdir -p ./results/MYC_with_RiboTac/CV_5
python main.py --do_infer \
		       --cuda 5 \
    		   --infer_config_dir ${DIR}/config.pkl \
    		   --infer_model_dir ${DIR}/SMRTnet_cv5.pth \
		       --infer_out_dir ./results/MYC_with_RiboTac/CV_5/results.txt \
    		   --infer_rna_dir ./data/MYC_IRES.txt \
    		   --infer_drug_dir ./data/MYC_RIBOTAC.txt \
    		   --lm_rna_config ./LM_RNA/parameters.json \
    		   --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
    		   --lm_mol_config ./LM_Mol/bert_vocab.txt  \
    		   --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

#6. Final binding scores were calculated by taking the median across the outputs of all models.

python mergeCV.py --data_dir ./results/MYC_with_RiboTac --results_name results

♣️ Perform benchmarking on known interactions

You can run the benchmarking with the following command:

python main.py --do_benchmark

SMRTnet uses an ensemble scoring strategy, combining outputs from the five models of 5-fold cross-validation to benchmark known small molecule-RNA pairs. In the example, we use the SMRTnet_benchmark_RBIND.txt dataset, but you can replace it with any of the other benchmark datasets available in the data folder.

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

DIR=./results/SMRTnet_model

python main.py --do_benchmark \
			   --cuda 0 \
               --infer_config_dir ${DIR}/config.pkl \
               --infer_model_dir ${DIR} \
               --infer_out_dir ./results/RBIND \
               --data_dir ./data/SMRTnet_benchmark_RBIND.txt \
               --lm_rna_config ./LM_RNA/parameters.json \
	           --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
	           --lm_mol_config ./LM_Mol/bert_vocab.txt  \
	           --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

💎 Identify potential binding sites on RNA

To compute high-attention regions using the trained models, you can run the following command and visualize the results in a Jupyter Notebook.

python main.py --do_explain

We provide example scripts to perform model interpretability analyses:

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

DIR=./results/SMRTnet_model

python main.py --do_explain \
    		   --cuda 0 \
    		   --infer_config_dir ${DIR}/config.pkl \
    		   --infer_model_dir ${DIR} \
    		   --infer_out_dir ./results/MYC
    		   --infer_rna_dir ./data/MYC_IRES.txt \
    		   --infer_drug_dir ./data/MYC_RIBOTAC.txt --smooth_steps 3 \
    		   --lm_rna_config ./LM_RNA/parameters.json \
    		   --lm_rna_model ./LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt \
    		   --lm_mol_config ./LM_Mol/bert_vocab.txt  \
    		   --lm_mol_model ./LM_Mol/pretrained/checkpoints/N-Step-Checkpoint_3_30000.ckpt

You can run interpret.ipynb after executing the command above to visualize the potential binding sites on RNA as below:

🔔 Extract RNA sequence embeddings with our RNA language model

We developed an RNA language model, RNASwan-seq, for learning RNA sequence representations. The model consisted of 30 transformer encoder blocks with Rotary Positional Embeddings. Each block includes a feed-forward layer with a hidden size of 640 and 20 attention heads. The model was trained using masked language modeling to recover the original masked tokens using cross-entropy loss.

Click here to view the architecture of the RNA language model (RNASwan-seq)

We provide example scripts to extract embeddings of given RNA sequences from the RNASwan-seq for use in other downstream applications:

# Note: Please ensure that your current working directory is set to the `SMRTnet` folder.

from smrtnet.utils import tailor_batch
from transformers import EsmModel as pretrain_bert
from transformers import EsmConfig
import torch

## Set device
cuda=0
device = torch.device("cuda:"+str(cuda) if torch.cuda.is_available() else "cpu")


## Prepare RNA sequences
data = [
    ("Seq1", "CUCAUAUAAUCGCGUGGAUAUGGCACGCGAGUUUCUACCGGGCACCGUAAAUGUCCGACUAUGGG"),
    ("Seq2", "GUGGGGGCUUCGCCUCUGGCCCAGCCCUCAC"),
]
batch_data = tailor_batch([x for (_,x) in data])


## Load RNASwan-seq model
lm_rna_config = './LM_RNA/parameters.json'
lm_rna_model = './LM_RNA/model_state_dict/rnaall_img0_min30_lr5e5_bs30_2w_7136294_norm1_05_1025_150M_16_rope_fa2_noropeflash_eps1e6_aucgave_1213/epoch_0/LMmodel.pt'
lm_ft = True

configuration_pretrain = EsmConfig.from_pretrained(lm_rna_config)
RNASwan_seq = pretrain_bert(configuration_pretrain).to(device)
dict_para_pretrain = torch.load(lm_rna_model, map_location=torch.device('cuda:'+str(cuda)))

for name_, para_ in RNASwan_seq.state_dict().items():
    if 'esm.' + name_ in dict_para_pretrain.keys():
        RNASwan_seq.state_dict()[name_].copy_(dict_para_pretrain['esm.' + name_])
for para in RNASwan_seq.parameters():
    if lm_ft:
        para.requires_grad = True
    else:
        para.requires_grad = False


## Extract embeddings
RNASwan_seq.eval()
with torch.no_grad():
    re_input_ids = torch.tensor(batch_data['input_ids']).to(device)
    re_atten_mask = torch.tensor(batch_data['attention_mask']).to(device)
    v_Pe, _ = RNASwan_seq(**{'input_ids': re_input_ids.long(), 'attention_mask':re_atten_mask})

v_Pe = v_Pe.last_hidden_state
token_embeddings = v_Pe[:, 1:, :]
print(token_embeddings[0]) #Print embeddings for `Seq1`
print(token_embeddings[1]) #Print embeddings for `Seq2`

Referenced Repos

MoLFormer: https://github.com/IBM/molformer
Convolutional neural networks: LeNet and AlexNet
Residual neutral networks: https://doi.org/10.48550/arXiv.1512.03385
Graph Attention networks: https://github.com/awslabs/dgl-lifesci
Transformer: https://doi.org/10.48550/arXiv.1706.03762
OPENBABEL: https://github.com/openbabel/openbabel and web server
atomium: https://github.com/samirelanduk/atomium
DSSR: http://home.x3dna.org/

Copyright and License

This project is free to use for non-commercial purposes - see the LICENSE file for details.

Patent Declaration

Patent Name：小分子和RNA互作关系的预测方法、系统、存储介质和设备
Inventors：张强锋 (Qiangfeng Cliff Zhang)、费宇涵 (Yuhan Fei)、张佳胜 (Jiasheng Zhang)
Publication Number：CN120199320A
Publication Date：2025-06-24
Related URL：https://m.tianyancha.com/zhuanli/51f993cb151b132315faa8d150a6a981

Disclaimer

The predictions generated by SMRTnet should be carefully reviewed by experts before proceeding to wet-lab validation. As the work is still under active development and subject to certain limitations, the predicted drugs should not be used directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ SMRTnet ✨

🎯 SMRTnet: Predicting small molecule and RNA target interactions using deep neural networks

📍 Cite us

☎️ Contact us

📖 Table of contents

Getting started

📌 Install via PyPI

📌 Run SMRTnet via Google Colab

Download pre-trained models from Zenodo

💬 The details of each module in SMRTnet

Repo Structure:

Datasets

📌 Datasets for training and test:

📌 Datasets for inference:

📌 Datasets for benchmarking:

Usage

♥️ Train your model from scratch

♠️ Evaluate performance on the test set

♦️ Make predictions for novel interactions

♣️ Perform benchmarking on known interactions

💎 Identify potential binding sites on RNA

🔔 Extract RNA sequence embeddings with our RNA language model

Referenced Repos

Copyright and License

Patent Declaration

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 721 Commits
data		data
dataset_cv_best		dataset_cv_best
figs		figs
img_log		img_log
results		results
LICENSE		LICENSE
README.md		README.md
SMRTnet.ipynb		SMRTnet.ipynb
explain.py		explain.py
infer.py		infer.py
inference.py		inference.py
interpret.ipynb		interpret.ipynb
loader.py		loader.py
loop.py		loop.py
main.py		main.py
mergeCV.py		mergeCV.py
model.py		model.py
requirements.txt		requirements.txt
utils.py		utils.py
visual.py		visual.py

Folders and files

Latest commit

History

Repository files navigation

✨ SMRTnet ✨

🎯 SMRTnet: Predicting small molecule and RNA target interactions using deep neural networks

📍 Cite us

☎️ Contact us

📖 Table of contents

Getting started

📌 Install via PyPI

📌 Run SMRTnet via Google Colab

Download pre-trained models from Zenodo

💬 The details of each module in SMRTnet

Repo Structure:

Datasets

📌 Datasets for training and test:

📌 Datasets for inference:

📌 Datasets for benchmarking:

Usage

♥️ Train your model from scratch

♠️ Evaluate performance on the test set

♦️ Make predictions for novel interactions

♣️ Perform benchmarking on known interactions

💎 Identify potential binding sites on RNA

🔔 Extract RNA sequence embeddings with our RNA language model

Referenced Repos

Copyright and License

Patent Declaration

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages