This project implements a Transformer model for sequence-to-sequence tasks, such as translation or text generation. The model is built using PyTorch and includes training and evaluation scripts, as well as utilities for data loading and logging.
- Implementation of the Transformer architecture.
- Support for training and evaluation on custom datasets.
- Logging of training and evaluation metrics.
- Easy integration with Hugging Face's
transformerslibrary for tokenization.
- Python 3.9 or higher
- PyTorch 1.7.0 or higher
- Transformers library
- Other dependencies listed in
requirements.txt
-
Clone the repository:
git clone https://github.com/yourusername/transformer-seq2seq.git cd transformer-seq2seq -
Install the required packages:
pip install -r requirements.txt
-
Download the necessary datasets and place them in the
data/directory.
To train the Transformer model, run the following command:
python train/train.pyThis will start the training process using the configurations specified in config/config.yaml. The trained model will be saved in the checkpoints/ directory.
To evaluate the trained model, run the following command:
python evaluate/evaluate.pyThis will load the saved model and evaluate it on the validation dataset, logging the results.
The configuration for the model, training parameters, and data paths is specified in the config/config.yaml file. Here is an example of the configuration structure:
training:
device: "cuda" # or "cpu"
learning_rate: 0.001
epochs: 10
batch_size: 32
data:
train_file: "data/train_data.txt"
val_file: "data/val_data.txt"
model:
src_vocab_size: 30522 # Vocabulary size for source language
tgt_vocab_size: 30522 # Vocabulary size for target language
src_seq_len: 128 # Maximum source sequence length
tgt_seq_len: 128 # Maximum target sequence length
d_model: 512 # Embedding dimension
num_layers: 6 # Number of encoder/decoder layers
num_heads: 8 # Number of attention heads
dropout: 0.1 # Dropout rate
d_ff: 2048 # Feed-forward hidden dimension
pad_idx: 0 # Padding index for loss computationThis project is licensed under the MIT License. See the LICENSE file for details.
- The original Transformer model was introduced in the paper "Attention is All You Need" by Vaswani et al.
- This implementation uses the Hugging Face Transformers library for tokenization and model handling.
- Replace
https://github.com/yourusername/transformer-seq2seq.gitwith the actual URL of your GitHub repository. - Adjust the sections and content as necessary to fit your project's specifics, including any additional features or instructions.
- Ensure that the
requirements.txtfile is created and includes all necessary dependencies for your project.