A rigorous, reproducible framework for testing ML-based alpha generation strategies.
This research demonstrates that apparent alpha from ML models on public market data is primarily overfitting.
The critical comparison between experiments C6 and C7 proves this:
| Metric | C6 (Weak Regularization) | C7 (Strong Regularization) |
|---|---|---|
| Features | 41 | 8 |
| Ridge Ξ± | 1.0 | 100.0 |
| Mean IC | +0.036 | -0.084 |
| p-value | 0.14 | 0.001 |
| Validation RΒ² | All negative | All negative |
The IC flips from positive to significantly negative when overfitting is controlled.
This is the classic signature of spurious patterns: complex models find noise, simple models reveal truth.
| ID | Description | Key Result | Status |
|---|---|---|---|
| E1 | LSTM Overfitting Demo | Train IC +0.86 β Val IC -0.06 | β Severe overfitting demonstrated |
| E2 | Cross-Sectional Targets | IC β 0.02-0.03, normal decay | β Weak signal |
| E3 | Fundamentals (BIASED) | IC +0.18, p=0.0000 | |
| E4 | Fundamentals (Corrected) | IC β 0.02, p > 0.10 | β No significance after correction |
| ID | Description | Key Result | Status |
|---|---|---|---|
| C1 | Time-Series Direction | Edge ~0.5%, IC ~0.04 | β Marginal, not robust |
| C2 | Technical Alpha | IC -0.01, p=0.55 | β Not significant |
| C3 | Production System | Sharpe 0.69, p=0.02 | |
| C6 | Overfitting Demo | IC +0.036, Val RΒ² < 0 | |
| C7 | Anti-Overfitting | IC -0.084, p=0.001 | β MAIN RESULT |
git clone https://github.com/yourusername/alpha_research_framework.git
cd alpha_research_framework
pip install -r requirements.txt# Run the critical C6 vs C7 comparison
python run_experiments.py --compare# List all experiments
python run_experiments.py --list
# Run specific experiment
python run_experiments.py -e C7 # Main result
python run_experiments.py -e E1 # Overfitting demo
python run_experiments.py -e E3 # Look-ahead bias demo
# Show documented results without running
python run_experiments.py --show# Full test suite (~15-20 minutes)
python run_experiments.py --allalpha_research_framework/
βββ README.md # This file
βββ requirements.txt # Dependencies
βββ run_experiments.py # Unified experiment runner
βββ experiments/
β βββ equity/
β β βββ E1_single_stock_lstm.py # LSTM overfitting
β β βββ E2_cross_sectional.py # Cross-sectional targets
β β βββ E3_fundamentals_bias.py # Look-ahead bias demo
β β βββ E4_annual_fundamentals.py # Bias-corrected
β βββ crypto/
β βββ C1_timeseries.py # Direction prediction
β βββ C2_technical_alpha.py # Technical indicators
β βββ C3_production_system.py # Full system
β βββ C6_overfitting_demo.py # β οΈ Shows overfitting
β βββ C7_robust_final.py # β
Main result
βββ docs/
β βββ RESEARCH_JOURNEY.md # Full narrative
β βββ METHODOLOGY.md # Technical details
β βββ RESULTS_SUMMARY.md # All results
βββ results/
βββ documented_results.json
- Walk-Forward Validation: Train on past, test on future (no shuffling)
- Purge Gap: 5-10 day gap between train/test to prevent leakage
- Feature Reduction: 8 features max (vs 41 in overfit version)
- Strong Regularization: Ridge Ξ± = 100 (vs 1.0 in overfit version)
- Cross-Sectional Targets: Rank-based to remove market trend
- IC t-test: Test if mean IC is significantly different from zero
- Multiple Folds: 40-70 walk-forward windows per experiment
- Validation RΒ²: Must be non-negative (negative = overfitting)
- Neural networks overfit easily on financial data (E1)
- Look-ahead bias creates fake alpha - always check data timestamps (E3 vs E4)
- Positive IC with negative validation RΒ² = overfitting (C6)
- Strong regularization reveals truth (C7)
- Public price data has no exploitable alpha with standard ML
- Walk-forward validation with purge gaps
- Cross-sectional (ranking) targets
- Minimal features (8-10 max)
- Strong regularization (Ridge Ξ± β₯ 100)
- Statistical significance testing
- Complex models (LSTM, deep MLP) without heavy regularization
- Many features (>20) without selection
- Absolute return targets (includes beta)
- Using current data for historical predictions
- RESEARCH_JOURNEY.md: Full narrative from hypothesis to null result
- METHODOLOGY.md: Technical details and anti-overfitting checklist
- RESULTS_SUMMARY.md: Complete results tables
This framework demonstrates:
- Rigorous methodology for financial ML research
- Honest null result - finding no alpha is valid science
- Reproducible experiments with clear documentation
- Overfitting detection techniques applicable to any ML project
This is research code for educational purposes. Results are based on historical data and do not guarantee future performance. This is not financial advice.
MIT License - see LICENSE for details.
Contributions welcome! Please read the methodology documentation first to understand the anti-overfitting principles.
@software{alpha_research_framework,
title={Alpha Research Framework: A Rigorous Approach to ML-Based Alpha Generation},
year={2025},
url={https://github.com/BianchiGiacomo/alpha-research-framework}
}