Transformer for Translation

This project implements a translation model using the Transformer architecture, based on the groundbreaking paper "Attention is All You Need" (Vaswani et al., 2017). The implementation focuses on English-to-French translation whilst offering a simple to understand implementation of the architecture in PyTorch.

Project Overview

The Transformer architecture revolutionized natural language processing by eliminating the need for recurrent or convolutional neural networks, instead relying entirely on attention mechanisms to capture relationships between words. This implementation showcases three key innovations:

Multi-Head Self-Attention: Allowing the model to simultaneously attend to information from different representation subspaces
Encoder-Decoder Architecture: Processing the input sequence and generating the output sequence using stacked attention layers
Positional Encoding: Incorporating sequence order information without recurrence

Installation

git clone https://github.com/yourusername/transformer-translation.git
cd transformer-translation
pip install -r requirements.txt

Implementation Details

Core Components

Tokenization (tokenisers.py):
- Word-level tokenization with special tokens (PAD, UNK, START, END)
- Vocabulary creation with frequency-based filtering
- Text encoding and decoding utilities
Transformer Architecture (model.py):
- Multi-head attention implementation with separate query, key, and value projections
- Encoder and decoder stacks with residual connections
- Position-wise feed-forward networks
- Positional encoding implementation
Training Pipeline (train.py):
- Custom dataset class for handling parallel text data
- Training loop with learning rate scheduling
- Validation and model checkpointing
- Generation utilities for inference

We have also provided a notebook Transformer_Translation.ipynb which describes how the model works and how training works.

Model Configuration

The default model configuration includes:

6 encoder layers
3 pre-cross-attention decoder layers
3 cross-attention decoder layers
8 attention heads
256 embedding dimensions
Dropout rate of 0.1

Training

To train the model:

python train.py

The training script includes:

Dynamic learning rate adjustment
Gradient clipping
Model checkpointing
Validation monitoring

Data Preparation

The model expects parallel text data in CSV format with columns for source (English) and target (French) sentences. The data should be preprocessed to:

Convert text to lowercase
Add appropriate spacing around punctuation
Remove special characters
Normalize whitespace

An example dataset that is appropriate can be found at https://www.kaggle.com/datasets/devicharith/language-translation-englishfrench.

Inference

To translate text using a trained model:

from model import Transformer, TransformerConfig
import torch

# Load model and tokenizers
model = Transformer(config).to(device)
model.load_state_dict(torch.load('models/best_model.pt')['model_state_dict'])

# Generate translation
translated_ids = model.generate(
    src_ids=encoded_input,
    max_new_tokens=128,
    temperature=1.0,
    top_k=50
)

Performance Considerations

The implementation includes several optimizations:

Parallel computation in multi-head attention
Efficient batch processing of sequences
Memory-efficient attention masking
Gradient clipping for stable training

Requirements

Python 3.7+
PyTorch 1.7+
NumPy
tqdm

Additional dependencies can be found in requirements.txt.

Citation

If you use this implementation in your research, please cite:

@article{vaswani2017attention,
  title={Attention is all you need},
  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Lukasz and Polosukhin, Illia},
  journal={Advances in neural information processing systems},
  volume={30},
  year={2017}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer for Translation

Project Overview

Installation

Implementation Details

Core Components

Model Configuration

Training

Data Preparation

Inference

Performance Considerations

Requirements

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
Transformer_Translation.ipynb		Transformer_Translation.ipynb
generate.py		generate.py
model.py		model.py
readme.md		readme.md
tokenisers.py		tokenisers.py
train.py		train.py

bluehood/Transformer-Translation

Folders and files

Latest commit

History

Repository files navigation

Transformer for Translation

Project Overview

Installation

Implementation Details

Core Components

Model Configuration

Training

Data Preparation

Inference

Performance Considerations

Requirements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages