Skip to content

This repository contains practical recipes for quantizing and optimizing pre-trained models for efficient inference.

Notifications You must be signed in to change notification settings

ethanshenley/Quantization-Cookbook

Repository files navigation

Quantization Recipes for NLP Models

This repository contains a series of Jupyter notebooks demonstrating various quantization and optimization techniques for NLP models. These notebooks provide practical implementations of state-of-the-art methods for model compression and efficient inference.

Notebooks

  1. 01_large_language_model_optimization.ipynb: Optimizing a large language model for low-latency inference.
  2. 02_vision_transformer_edge_optimization.ipynb: Fine-tuning and quantizing a Vision Transformer for edge devices.
  3. 03_bert_question_answering_quantization.ipynb: Quantizing a BERT-based model for question answering tasks.
  4. 04_multitask_nlp_quantization.ipynb: Transfer learning and quantization for multi-task NLP.

Key Features

  • Mixed precision training
  • Post-training quantization (PTQ)
  • Quantization-aware fine-tuning (QAF)
  • Dynamic quantization
  • Pruning techniques
  • Layer fusion
  • Efficient attention mechanisms
  • Knowledge distillation

Cheatsheet

Check out my cheatsheet called "Quantization and Precision Tuning for Optimization" for some more info! Feel free to share :)

Getting Started

  1. Clone this repository

git clone https://github.com/ethanshebley/quantization-recipes.git cd quantization-recipes

  1. Install the required packages

pip install -r requirements.txt

  1. Open the Jupyter notebooks and run!

Contributing

I would welcome contributions! Please feel free to submit a Pull Request.

About

This repository contains practical recipes for quantizing and optimizing pre-trained models for efficient inference.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors