Skip to content

BianchiGiacomo/alpha-research-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Alpha Research Framework

A rigorous, reproducible framework for testing ML-based alpha generation strategies.

Python 3.8+ License: MIT

🎯 Key Finding

This research demonstrates that apparent alpha from ML models on public market data is primarily overfitting.

The critical comparison between experiments C6 and C7 proves this:

Metric C6 (Weak Regularization) C7 (Strong Regularization)
Features 41 8
Ridge Ξ± 1.0 100.0
Mean IC +0.036 -0.084
p-value 0.14 0.001
Validation RΒ² All negative All negative

The IC flips from positive to significantly negative when overfitting is controlled.

This is the classic signature of spurious patterns: complex models find noise, simple models reveal truth.


πŸ“Š Experiment Results Summary

Equity Experiments (E1-E4)

ID Description Key Result Status
E1 LSTM Overfitting Demo Train IC +0.86 β†’ Val IC -0.06 βœ… Severe overfitting demonstrated
E2 Cross-Sectional Targets IC β‰ˆ 0.02-0.03, normal decay βœ… Weak signal
E3 Fundamentals (BIASED) IC +0.18, p=0.0000 ⚠️ FAKE - look-ahead bias
E4 Fundamentals (Corrected) IC β‰ˆ 0.02, p > 0.10 βœ… No significance after correction

Crypto Experiments (C1-C7)

ID Description Key Result Status
C1 Time-Series Direction Edge ~0.5%, IC ~0.04 βœ… Marginal, not robust
C2 Technical Alpha IC -0.01, p=0.55 βœ… Not significant
C3 Production System Sharpe 0.69, p=0.02 ⚠️ Borderline
C6 Overfitting Demo IC +0.036, Val R² < 0 ⚠️ SPURIOUS
C7 Anti-Overfitting IC -0.084, p=0.001 βœ… MAIN RESULT

πŸš€ Quick Start

Installation

git clone https://github.com/yourusername/alpha_research_framework.git
cd alpha_research_framework
pip install -r requirements.txt

Run Key Comparison (Recommended)

# Run the critical C6 vs C7 comparison
python run_experiments.py --compare

Run Individual Experiments

# List all experiments
python run_experiments.py --list

# Run specific experiment
python run_experiments.py -e C7    # Main result
python run_experiments.py -e E1    # Overfitting demo
python run_experiments.py -e E3    # Look-ahead bias demo

# Show documented results without running
python run_experiments.py --show

Run All Experiments

# Full test suite (~15-20 minutes)
python run_experiments.py --all

πŸ“ Project Structure

alpha_research_framework/
β”œβ”€β”€ README.md                 # This file
β”œβ”€β”€ requirements.txt          # Dependencies
β”œβ”€β”€ run_experiments.py        # Unified experiment runner
β”œβ”€β”€ experiments/
β”‚   β”œβ”€β”€ equity/
β”‚   β”‚   β”œβ”€β”€ E1_single_stock_lstm.py    # LSTM overfitting
β”‚   β”‚   β”œβ”€β”€ E2_cross_sectional.py      # Cross-sectional targets
β”‚   β”‚   β”œβ”€β”€ E3_fundamentals_bias.py    # Look-ahead bias demo
β”‚   β”‚   └── E4_annual_fundamentals.py  # Bias-corrected
β”‚   └── crypto/
β”‚       β”œβ”€β”€ C1_timeseries.py           # Direction prediction
β”‚       β”œβ”€β”€ C2_technical_alpha.py      # Technical indicators
β”‚       β”œβ”€β”€ C3_production_system.py    # Full system
β”‚       β”œβ”€β”€ C6_overfitting_demo.py     # ⚠️ Shows overfitting
β”‚       └── C7_robust_final.py         # βœ… Main result
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ RESEARCH_JOURNEY.md   # Full narrative
β”‚   β”œβ”€β”€ METHODOLOGY.md        # Technical details
β”‚   └── RESULTS_SUMMARY.md    # All results
└── results/
    └── documented_results.json

πŸ”¬ Methodology Highlights

Anti-Overfitting Techniques

  1. Walk-Forward Validation: Train on past, test on future (no shuffling)
  2. Purge Gap: 5-10 day gap between train/test to prevent leakage
  3. Feature Reduction: 8 features max (vs 41 in overfit version)
  4. Strong Regularization: Ridge Ξ± = 100 (vs 1.0 in overfit version)
  5. Cross-Sectional Targets: Rank-based to remove market trend

Statistical Rigor

  • IC t-test: Test if mean IC is significantly different from zero
  • Multiple Folds: 40-70 walk-forward windows per experiment
  • Validation RΒ²: Must be non-negative (negative = overfitting)

πŸ“ˆ Key Insights

What We Learned

  1. Neural networks overfit easily on financial data (E1)
  2. Look-ahead bias creates fake alpha - always check data timestamps (E3 vs E4)
  3. Positive IC with negative validation RΒ² = overfitting (C6)
  4. Strong regularization reveals truth (C7)
  5. Public price data has no exploitable alpha with standard ML

What Works

  • Walk-forward validation with purge gaps
  • Cross-sectional (ranking) targets
  • Minimal features (8-10 max)
  • Strong regularization (Ridge Ξ± β‰₯ 100)
  • Statistical significance testing

What Doesn't Work

  • Complex models (LSTM, deep MLP) without heavy regularization
  • Many features (>20) without selection
  • Absolute return targets (includes beta)
  • Using current data for historical predictions

πŸ“š Documentation


πŸŽ“ Academic Value

This framework demonstrates:

  1. Rigorous methodology for financial ML research
  2. Honest null result - finding no alpha is valid science
  3. Reproducible experiments with clear documentation
  4. Overfitting detection techniques applicable to any ML project

⚠️ Disclaimer

This is research code for educational purposes. Results are based on historical data and do not guarantee future performance. This is not financial advice.


πŸ“„ License

MIT License - see LICENSE for details.


🀝 Contributing

Contributions welcome! Please read the methodology documentation first to understand the anti-overfitting principles.


πŸ“– Citation

@software{alpha_research_framework,
  title={Alpha Research Framework: A Rigorous Approach to ML-Based Alpha Generation},
  year={2025},
  url={https://github.com/BianchiGiacomo/alpha-research-framework}
}

About

Rigorous ML framework for alpha generation research. Demonstrates that apparent alpha from public market data is primarily overfitting. Key finding: IC flips from +0.036 to -0.084 when regularization is increased 100x.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages