local training - Jordan's model #1
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant improvements to the local training pipeline for chess models, including a new training script, enhancements to the dataset processing, and updates to the transformer model architecture. The main themes are the addition of a training script, improvements to dataset handling, and architectural changes to the transformer model.
Training Pipeline Enhancements:
local_training.pythat sets up and runs a training loop for chess models using PyTorch, including model saving, evaluation, and dataloader setup. This script enables local training using either theChessCNNorChessTransformermodels.Dataset Handling Improvements:
fen_to_bitboardsinstockfishdataset.pyto output a 12x8x8 tensor for each board position, supporting both white and black to move, and added aStockfishDatasetclass compatible with PyTorch’sDatasetAPI. This makes data loading and preprocessing more robust and efficient for model training.Transformer Model Architecture Updates:
MultiheadAttentionmodule that uses Flash Attention for improved efficiency, although it is not yet integrated into the encoder.ChessTransformermodel from 4 to 6, potentially improving model capacity and performance.