Source code for the paper "NeurIDA: Dynamic Modeling For Effective In-Database Analytics", especially for the algorithm part related to Dynamic In-Database Modeling (DIME).
we present the DIME modeling framework, focusing on the composable base model architecture and its execution flow. When model augmentation is invoked, DIME executes a bespoke modeling pipeline tailored to the specific analytical task. The framework first builds a relational graph containing tuples from the target table and related tables, then dynamically constructs a bespoke model tailored to this graph using the selected base model and shared model components, and finally generates predictions for tuples in the target table using the constructed model.
neurida/
├── model/ # Core model implementations
│ ├── aida.py # Main AIDA framework (AIDAXFormer, relation modules, encoders)
│ ├── rdb.py # RDB baseline model with HeteroGraphSAGE
│ ├── base.py # Base architecture components
│ ├── tabular/ # Tabular encoders (TabM, DeepFM, ARMNet, etc.)
│ └── layer/ # Custom layers (fusion, relation convolution)
├── aida/ # AIDA experiment framework
│ ├── aida_run.py # Main training script
│ ├── prompt/ # LLM-based prompt generation
│ ├── db/ # Database profiling utilities
│ └── run_*.sh # Experiment shell scripts
├── utils/ # Utility functions
│ ├── data/ # Dataset implementations and factory
│ ├── builder.py # Graph construction utilities
│ ├── sample.py # Neighbor sampling
│ └── preprocess.py # Type inference and preprocessing
├── cmds/ # Command-line tools for baselines
├── data/ # Data directory (download required)
└── environment.yml # Conda environment specification
# Clone the repository
git clone <repository-url>
cd neurida
# Create and activate conda environment
conda env create -f environment.yml
conda activate deepdbSupported Datasets:
- H&M Fashion (hm): Fashion retail transactions
- Avito (avito): Online classifieds platform
- Event (event): Event attendance data
- RateBeer (ratebeer): Beer ratings and reviews
- OLIST (olist): Brazilian e-commerce
- Trial (trial): Medical trial outcomes
- Stack Overflow (stack): Developer engagement
The data/ directory contains two main types of artifacts:
- Tabular Data: Flattened relational data with various feature engineering levels
- TensorFrame Data: Materialized database graph structures stored as PyTorch tensors
Data files are excluded from git. You can download them from the official website or they will be generated automatically on first run. See data/README.md for more details.
Run full experiments across all datasets and encoders:
bash aida/run_aida_experiments.shThis runs experiments with multiple base encoders (mlp, tabm, dfm, resnet, fttrans) on classification and regression tasks.
Single dataset/encoder experiments:
bash aida/run_aida_single_dataset.sh # Test on a specific dataset
bash aida/run_aida_single_encoder.sh # Test a specific encoderAblation studies:
bash aida/run_aida_ablation.sh # Test impact of model components
bash aida/run_aida_neighbor_size.sh # Test neighbor sampling sizes
bash aida/run_aida_relation_args.sh # Test relation module configurations
bash aida/run_aida_base_encoder.sh # Compare base encodersMachine learning baselines:
bash aida/ml_baseline.sh # XGBoost, LightGBM, CatBoost
bash aida/sklearn_baseline.sh # Random Forest, etc.Neural network baselines:
bash aida/fit_best_baseline.sh # Best DNN baseline
bash aida/fit_medium_baseline.sh # Medium DNN baseline
bash aida/fit_low_baseline.sh # Low DNN baseline
bash aida/tpberta_medium_baseline.sh # TPBerta baselineFor custom training with specific parameters:
python -m aida.aida_run \
--db_name hm \
--tf_cache_dir data/hm-tensor-frame \
--task_name user-churn \
--base_encoder tabm \
--channels 128 \
--relation_layer_num 2 \
--num_neighbors 128 128 \
--num_epochs 500Key arguments:
--db_name: Database name (hm, avito, event, trial, ratebeer, olist, stack)--task_name: Task name (e.g., user-churn, item-sales, user-repeat)--base_encoder: Base encoder (mlp, tabm, dfm, resnet, fttrans, armnet)--deactivate_fusion_module: Disable fusion module (ablation)--deactivate_relation_module: Disable relation module (ablation)

