InertDB

A Comprehensive Database of Biologically Inactive Compounds

Overview

InertDB is a curated chemical database designed to address the lack of biologically inactive compounds in predictive modeling for AI-based drug discovery. This limitation often leads to biased datasets dominated by active compounds, reducing the diversity and robustness of machine learning models.

InertDB bridges this gap by providing:

Curated Inactive Compounds (CICs): 3,205 inactive compounds rigorously curated from PubChem BioAssays.
Generated Inactive Compounds (GICs): 64,368 potential inactive compounds generated using deep generative AI trained on the CICs. By offering a comprehensive resource for biologically inactive small molecules and expanding the chemical space with GICs, Inert DB aims to enhance the robustness and accuracy of predictive AI models in toxicology and pharmacology.

Key Features

Diverse Assays: CICs are extracted from over 260 million PubChem bioassay results, leveraging an NLP-based assay diversity metric.
AI-Generated Inactives: GICs supplement chemical space using RNN-based deep generative AI (inertdb_generator.py).
Low PAINS Content: Minimizes frequent false positives in high-throughput screening.
Drug-Like Properties: CICs exhibit physicochemical properties comparable to approved drugs.
Validated Performance: Predictive modeling benchmarks (LIT-PCBA and MUV) show significant improvements.

Repository Structure

InertDB/
├── data/                 # Pre-processed datasets of CICs and GICs
│   ├── inertdb_cic_v2024.03.smi
│   ├── inertdb_gic_v2024.03.smi
│
├── inertdb_generator.py  # Script for generating additional GICs
├── README.md             # Project documentation (this file)

Usage

1. Download Pre-Processed InertDB Datasets

Download the CICs and GICs datasets:

wget https://raw.githubusercontent.com/ann081993/InertDB/main/data/inertdb_cic_v2024.03.smi
wget https://raw.githubusercontent.com/ann081993/InertDB/main/data/inertdb_gic_v2024.03.smi

2. Generate Additional GICs

Use the provided script to generate new GICs using the pre-trained generative AI model.

1. Requirements

Ensure the following Python packages are installed, or install the dependencies from requirements.txt:

tensorflow
numpy
rdkit

conda create -n inertdb python=3.10
conda activate inertdb
pip install -r requirements.txt

2. Run the Script

Generate additional GICs by specifying the number of iterations:

python inertdb_generator.py -n NUM_GENERATIONS -o OUTPUT_FILE

NUM_GENERATIONS: Number of iterations to generate (each iteration produces 1,000 SMILES).
OUTPUT_FILE: Name of the file to save the generated GICs (default: gic.txt).

Example:

python inertdb_generator.py -n 5 -o my_gics.txt

This generates up to 5,000 SMILES strings and saves the valid, unique SMILES to my_gics.txt.

Citation

If you use InertDB in your research, please considering citing the following publication:

@article{An2025,
  author    = {Seungchan An and Yeonjin Lee and Junpyo Gong and Seokyoung Hwang and In Guk Park and Jayhyun Cho and Min Ju Lee and Minkyu Kim and Yun Pyo Kang and Minsoo Noh},
  title     = {InertDB as a generative AI-expanded resource of biologically inactive small molecules from PubChem},
  journal   = {Journal of Cheminformatics},
  year      = {2025},
  volume    = {17},
  pages     = {49},
  doi       = {10.1186/s13321-025-00999-1},
  url       = {https://doi.org/10.1186/s13321-025-00999-1}
}

License

This InertDB is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. This curated dataset is freely available for academic and non-commercial research purposes. For commercial use, a license agreement is required. Please contact [ann081993 at snu dot ac dot kr] for or refer to the LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InertDB

Overview

Key Features

Repository Structure

Usage

1. Download Pre-Processed InertDB Datasets

2. Generate Additional GICs

1. Requirements

2. Run the Script

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
model		model
GA.png		GA.png
LICENSE		LICENSE
README.md		README.md
inertdb_generator.py		inertdb_generator.py
requirements.txt		requirements.txt

License

seungchan-an/InertDB

Folders and files

Latest commit

History

Repository files navigation

InertDB

Overview

Key Features

Repository Structure

Usage

1. Download Pre-Processed InertDB Datasets

2. Generate Additional GICs

1. Requirements

2. Run the Script

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages