A deep reinforcement learning chess engine implementing the AlphaZero algorithm with supervised fine-tuning and Monte Carlo Tree Search self-play.
This project implements a complete chess AI training pipeline inspired by DeepMind's AlphaZero. The system uses a two-stage approach: supervised learning from expert games followed by reinforcement learning through self-play with MCTS.
Key Features:
- ResNet-based policy-value network (~10M parameters)
- Canonical 4672-action move encoding for all legal chess moves
- MCTS-guided self-play for reinforcement learning
- Flexible training via CLI or Google Colab notebooks
- Comprehensive evaluation tools
pip install -r requirements.txt
pip install -e .Stage 1 - Supervised Fine-Tuning:
python -m sft.train configs/sft_config.yamlStage 2 - Reinforcement Learning:
python -m rl.train configs/rl_config.yamlAlternative - Google Colab:
- Upload
notebooks/train_sft.ipynbornotebooks/train_rl.ipynb - All dependencies are self-contained
- No additional setup required
# Test against Minimax AI
python tests/test_vs_minimax.py models/your_model.pth --games 20 --depth 3
# Watch games in detail
python tests/test_vs_minimax.py models/your_model.pth --games 2 --verbose
# Evaluate performance
python scripts/evaluate.py models/your_model.pth --games 100Neural Network:
- Input: 32-channel 8×8 board representation
- Backbone: 6-block ResNet with 64 channels
- Policy Head: 4672-dimensional move probabilities
- Value Head: Scalar position evaluation (-1 to +1)
Training Pipeline:
- SFT Stage: Learn from PGN game databases
- RL Stage: Improve through MCTS self-play
- Evaluation: Test against Minimax or other models
ML/
├── core/ # Core components
│ ├── models/ # Neural architectures
│ ├── chess_logic/ # Move/board encoding
│ └── utils/ # Checkpoints, logging
├── sft/ # Supervised training
├── rl/ # Reinforcement learning
│ ├── mcts/ # Monte Carlo Tree Search
│ └── self_play/ # Self-play generation
├── configs/ # Training configurations
├── notebooks/ # Colab-ready notebooks
├── scripts/ # Utilities
├── tests/ # Evaluation tools
└── models/ # Checkpoints
Customize training in configs/sft_config.yaml or configs/rl_config.yaml:
Model Architecture:
- ResNet blocks and channels
- Input/output dimensions (ACTION_SIZE=4672 is fixed)
Training Parameters:
- Learning rate, batch size, epochs
- MCTS simulations, temperature
- Data paths and checkpoint locations
Place PGN files in data/ folder. Recommended sources:
- FICS Games Database - Large collection of rated games
- Lichess Database - Monthly game archives
- CCRL - Computer chess games
Training times (CPU, approximate):
- SFT: 2-5 min/epoch (500 games)
- RL: 10-20 min/iteration (30 games, 100 MCTS simulations)
GPU acceleration significantly reduces training time.
- Python 3.8+
- PyTorch 1.12+
- python-chess
- NumPy, tqdm, PyYAML
See requirements.txt for complete dependencies.
Simply place your .pth checkpoint in models/ and run:
python tests/test_vs_minimax.py models/your_model.pthThe test automatically validates model compatibility and reports win rates.
Educational use only.
Based on the AlphaZero algorithm by DeepMind (Silver et al., 2017).