Transformer Model for Sequence-to-Sequence Tasks

This project implements a Transformer model for sequence-to-sequence tasks, such as translation or text generation. The model is built using PyTorch and includes training and evaluation scripts, as well as utilities for data loading and logging.

Features

Implementation of the Transformer architecture.
Support for training and evaluation on custom datasets.
Logging of training and evaluation metrics.
Easy integration with Hugging Face's transformers library for tokenization.

Requirements

Python 3.9 or higher
PyTorch 1.7.0 or higher
Transformers library
Other dependencies listed in requirements.txt

Installation

Clone the repository:

git clone https://github.com/yourusername/transformer-seq2seq.git
cd transformer-seq2seq

Install the required packages:
```
pip install -r requirements.txt
```
Download the necessary datasets and place them in the data/ directory.

Usage

Training

To train the Transformer model, run the following command:

python train/train.py

This will start the training process using the configurations specified in config/config.yaml. The trained model will be saved in the checkpoints/ directory.

Evaluation

To evaluate the trained model, run the following command:

python evaluate/evaluate.py

This will load the saved model and evaluate it on the validation dataset, logging the results.

Configuration

The configuration for the model, training parameters, and data paths is specified in the config/config.yaml file. Here is an example of the configuration structure:

training:
  device: "cuda" # or "cpu"
  learning_rate: 0.001
  epochs: 10
  batch_size: 32

data:
  train_file: "data/train_data.txt"
  val_file: "data/val_data.txt"

model:
  src_vocab_size: 30522 # Vocabulary size for source language
  tgt_vocab_size: 30522 # Vocabulary size for target language
  src_seq_len: 128 # Maximum source sequence length
  tgt_seq_len: 128 # Maximum target sequence length
  d_model: 512 # Embedding dimension
  num_layers: 6 # Number of encoder/decoder layers
  num_heads: 8 # Number of attention heads
  dropout: 0.1 # Dropout rate
  d_ff: 2048 # Feed-forward hidden dimension

pad_idx: 0 # Padding index for loss computation

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

The original Transformer model was introduced in the paper "Attention is All You Need" by Vaswani et al.
This implementation uses the Hugging Face Transformers library for tokenization and model handling.

Notes:

Replace https://github.com/yourusername/transformer-seq2seq.git with the actual URL of your GitHub repository.
Adjust the sections and content as necessary to fit your project's specifics, including any additional features or instructions.
Ensure that the requirements.txt file is created and includes all necessary dependencies for your project.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Content		Content
config		config
data		data
evaluate		evaluate
model		model
notebooks		notebooks
tests		tests
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
temp_code.py		temp_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Model for Sequence-to-Sequence Tasks

Table of Contents

Features

Requirements

Installation

Usage

Training

Evaluation

Configuration

License

Acknowledgments

Notes:

About

Uh oh!

Releases

Packages

Languages

AnuragIndora/transformer-seq2seq

Folders and files

Latest commit

History

Repository files navigation

Transformer Model for Sequence-to-Sequence Tasks

Table of Contents

Features

Requirements

Installation

Usage

Training

Evaluation

Configuration

License

Acknowledgments

Notes:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages