Skip to content

AnuragIndora/transformer-seq2seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Model for Sequence-to-Sequence Tasks

This project implements a Transformer model for sequence-to-sequence tasks, such as translation or text generation. The model is built using PyTorch and includes training and evaluation scripts, as well as utilities for data loading and logging.

Table of Contents

Features

  • Implementation of the Transformer architecture.
  • Support for training and evaluation on custom datasets.
  • Logging of training and evaluation metrics.
  • Easy integration with Hugging Face's transformers library for tokenization.

Requirements

  • Python 3.9 or higher
  • PyTorch 1.7.0 or higher
  • Transformers library
  • Other dependencies listed in requirements.txt

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/transformer-seq2seq.git
    cd transformer-seq2seq
  2. Install the required packages:

    pip install -r requirements.txt
  3. Download the necessary datasets and place them in the data/ directory.

Usage

Training

To train the Transformer model, run the following command:

python train/train.py

This will start the training process using the configurations specified in config/config.yaml. The trained model will be saved in the checkpoints/ directory.

Evaluation

To evaluate the trained model, run the following command:

python evaluate/evaluate.py

This will load the saved model and evaluate it on the validation dataset, logging the results.

Configuration

The configuration for the model, training parameters, and data paths is specified in the config/config.yaml file. Here is an example of the configuration structure:

training:
  device: "cuda" # or "cpu"
  learning_rate: 0.001
  epochs: 10
  batch_size: 32

data:
  train_file: "data/train_data.txt"
  val_file: "data/val_data.txt"

model:
  src_vocab_size: 30522 # Vocabulary size for source language
  tgt_vocab_size: 30522 # Vocabulary size for target language
  src_seq_len: 128 # Maximum source sequence length
  tgt_seq_len: 128 # Maximum target sequence length
  d_model: 512 # Embedding dimension
  num_layers: 6 # Number of encoder/decoder layers
  num_heads: 8 # Number of attention heads
  dropout: 0.1 # Dropout rate
  d_ff: 2048 # Feed-forward hidden dimension

pad_idx: 0 # Padding index for loss computation

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments


Notes:

  • Replace https://github.com/yourusername/transformer-seq2seq.git with the actual URL of your GitHub repository.
  • Adjust the sections and content as necessary to fit your project's specifics, including any additional features or instructions.
  • Ensure that the requirements.txt file is created and includes all necessary dependencies for your project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published