NeurIDA: Dynamic Modeling For Effective In-Database Analytics

Source code for the paper "NeurIDA: Dynamic Modeling For Effective In-Database Analytics", especially for the algorithm part related to Dynamic In-Database Modeling (DIME).

Dynamic In-Database Modeling

we present the DIME modeling framework, focusing on the composable base model architecture and its execution flow. When model augmentation is invoked, DIME executes a bespoke modeling pipeline tailored to the specific analytical task. The framework first builds a relational graph containing tuples from the target table and related tables, then dynamically constructs a bespoke model tailored to this graph using the selected base model and shared model components, and finally generates predictions for tuples in the target table using the constructed model.

Structure

neurida/
├── model/                      # Core model implementations
│   ├── aida.py                # Main AIDA framework (AIDAXFormer, relation modules, encoders)
│   ├── rdb.py                 # RDB baseline model with HeteroGraphSAGE
│   ├── base.py                # Base architecture components
│   ├── tabular/               # Tabular encoders (TabM, DeepFM, ARMNet, etc.)
│   └── layer/                 # Custom layers (fusion, relation convolution)
├── aida/                      # AIDA experiment framework
│   ├── aida_run.py           # Main training script
│   ├── prompt/               # LLM-based prompt generation
│   ├── db/                   # Database profiling utilities
│   └── run_*.sh              # Experiment shell scripts
├── utils/                     # Utility functions
│   ├── data/                 # Dataset implementations and factory
│   ├── builder.py            # Graph construction utilities
│   ├── sample.py             # Neighbor sampling
│   └── preprocess.py         # Type inference and preprocessing
├── cmds/                      # Command-line tools for baselines
├── data/                      # Data directory (download required)
└── environment.yml            # Conda environment specification

Installation and Data Setup

Environment Setup

# Clone the repository
git clone <repository-url>
cd neurida

# Create and activate conda environment
conda env create -f environment.yml
conda activate deepdb

Data Preparation

Supported Datasets:

H&M Fashion (hm): Fashion retail transactions
Avito (avito): Online classifieds platform
Event (event): Event attendance data
RateBeer (ratebeer): Beer ratings and reviews
OLIST (olist): Brazilian e-commerce
Trial (trial): Medical trial outcomes
Stack Overflow (stack): Developer engagement

The data/ directory contains two main types of artifacts:

Tabular Data: Flattened relational data with various feature engineering levels
TensorFrame Data: Materialized database graph structures stored as PyTorch tensors

Data files are excluded from git. You can download them from the official website or they will be generated automatically on first run. See data/README.md for more details.

Script Execution

Main Experiments

Run full experiments across all datasets and encoders:

bash aida/run_aida_experiments.sh

This runs experiments with multiple base encoders (mlp, tabm, dfm, resnet, fttrans) on classification and regression tasks.

Single dataset/encoder experiments:

bash aida/run_aida_single_dataset.sh    # Test on a specific dataset
bash aida/run_aida_single_encoder.sh    # Test a specific encoder

Ablation studies:

bash aida/run_aida_ablation.sh          # Test impact of model components
bash aida/run_aida_neighbor_size.sh     # Test neighbor sampling sizes
bash aida/run_aida_relation_args.sh     # Test relation module configurations
bash aida/run_aida_base_encoder.sh      # Compare base encoders

Baseline Experiments

Machine learning baselines:

bash aida/ml_baseline.sh                # XGBoost, LightGBM, CatBoost
bash aida/sklearn_baseline.sh           # Random Forest, etc.

Neural network baselines:

bash aida/fit_best_baseline.sh          # Best DNN baseline
bash aida/fit_medium_baseline.sh        # Medium DNN baseline
bash aida/fit_low_baseline.sh           # Low DNN baseline
bash aida/tpberta_medium_baseline.sh    # TPBerta baseline

Custom Training

For custom training with specific parameters:

python -m aida.aida_run \
    --db_name hm \
    --tf_cache_dir data/hm-tensor-frame \
    --task_name user-churn \
    --base_encoder tabm \
    --channels 128 \
    --relation_layer_num 2 \
    --num_neighbors 128 128 \
    --num_epochs 500

Key arguments:

--db_name: Database name (hm, avito, event, trial, ratebeer, olist, stack)
--task_name: Task name (e.g., user-churn, item-sales, user-repeat)
--base_encoder: Base encoder (mlp, tabm, dfm, resnet, fttrans, armnet)
--deactivate_fusion_module: Disable fusion module (ablation)
--deactivate_relation_module: Disable relation module (ablation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeurIDA: Dynamic Modeling For Effective In-Database Analytics

Dynamic In-Database Modeling

Structure

Installation and Data Setup

Environment Setup

Data Preparation

Script Execution

Main Experiments

Baseline Experiments

Custom Training

Experiment Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
aida		aida
cmds		cmds
data		data
docs		docs
model		model
utils		utils
README.md		README.md
environment.yml		environment.yml

nusdbsystem/NeurIDA

Folders and files

Latest commit

History

Repository files navigation

NeurIDA: Dynamic Modeling For Effective In-Database Analytics

Dynamic In-Database Modeling

Structure

Installation and Data Setup

Environment Setup

Data Preparation

Script Execution

Main Experiments

Baseline Experiments

Custom Training

Experiment Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages