Name	Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md	README.md

Name

Last commit message

Last commit date

Learning-Resources

A compilation of resources for keeping up with the latest trends in NLP.

Note: This resource list is a work in progress. More papers and topics will be added regularly. Contributions and suggestions are welcome!

Some Fundamental Transformers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT1
GPT2
T5
XLNet: Generalized Autoregressive Pretraining for Language Understanding
RoBERTa: A Robustly Optimized BERT Pretraining Approach
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Longformer: The Long-Document Transformer
ROFORMER
Language Models are Few-Shot Learners - GPT3 paper

Fundamental LLM & Transformer Papers/Blogs

Attention is all you need
Memory Is All You Need
Byte-pair Encoding
The Illustrated Transformer Blog
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer - MoE paper for LMs
Fast Transformer Decoding: One Write-Head is All You Need - Multi-Query Attention (MQA) Paper
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints - Grouped Query Attention Paper

Reinforcement Learning for LLMs

Basics of RL - OpenAI
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
InstructGPT
Training language models to follow instructions with human feedback
Deep Reinforcement Learning from Human Preferences

DPO:

DPO paper
Blog - Math behind DPO

PPO:

Proximal Policy Optimization Algorithms
PPO Docs OpenAI
Understanding PPO from First Principles Blog

GRPO:

DeepSeekMath
Blog - GRPO Explained
DeepSeek-R1

Mechanistic Interpretability

Basic Mech Interp Essay
Toy Neural Nets with low dimensional inputs
Mechanistic Interpretability for AI Safety Review
A Mathematical Framework for Transformer Circuits
Circuit Tracing: Revealing Computational Graphs in Language Models

Scaling Laws

Scaling Laws for Neural Language Models
Scaling Laws for Autoregressive Generative Modeling
Sacling Laws of Synthetic Data for Lnguage Models
Scaling Laws for Transfer
Unified Scaling Laws for Routed Language Models - Scaling laws for MoEs

MLSys

Mixed Precision Training
Matrix multiplication - Nvidia Blog
Understanding GPU Performance - Nvidia Blog
How to Train Really Large Models on Many GPUs? - Blog
Efficiently Scaling Transformer Inference
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning-Resources

Some Fundamental Transformers

Fundamental LLM & Transformer Papers/Blogs

Reinforcement Learning for LLMs

Mechanistic Interpretability

Scaling Laws

MLSys

About

Uh oh!

Releases

Packages

rraghavkaushik/NLP-Reading-List

Folders and files

Latest commit

History

Repository files navigation

Learning-Resources

Some Fundamental Transformers

Fundamental LLM & Transformer Papers/Blogs

Reinforcement Learning for LLMs

Mechanistic Interpretability

Scaling Laws

MLSys

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages