Awesome Mechanistic Interpretability

A carefully curated collection of high-quality libraries, projects, tutorials, research papers, and other essential resources focused on Mechanistic Interpretability, a growing subfield in machine learning interpretability research that aims to reverse-engineer neural networks into understandable computational components. This repository serves as a comprehensive and well-organized knowledge base for researchers, engineers, and enthusiasts working to uncover the inner workings of modern AI systems, particularly large language models (LLMs).

To ensure that the community stays updated on the latest developments, our repository is automatically updated with recent mechanistic interpretability papers from arXiv. This ensures timely access to new techniques, discoveries, and frameworks that are shaping the future of model transparency and alignment.

Note

📢 Announcement: Our paper from AIT Lab is now available on SSRN!
Title: Bridging the Black Box: A Survey on Mechanistic Interpretability in AI
If you find this paper interesting, please consider citing our work. Thank you for your support!

@article{somvanshi2025bridging,
  title={Bridging the Black Box: A Survey on Mechanistic Interpretability in AI},
  author={Somvanshi, Shriyank and Islam, Md Monzurul and Rafe, Amir and Tusti, Anannya Ghosh and Chakraborty, Arka and Baitullah, Anika and Chowdhury, Tausif Islam and Alnawmasi, Nawaf and Dutta, Anandi and Das, Subasish},
  journal={Available at SSRN 5345552},
  year={2025}
}

Whether you are investigating the circuits behind in-context learning, decoding attention heads in transformers, or exploring interpretability tools like activation patching and causal tracing, this collection serves as a centralized hub for everything related to Mechanistic Interpretability — enriched by original peer-reviewed contributions and hands-on research from the broader interpretability community.

Last Updated

January 28, 2026 at 01:19:34 AM UTC

Theorem

Papers (490)

Dedicated Publication Threads

Library

Tutorial

Written Tutorials

Mechanistic Interpretability for LLMs, explained by The CounterFactual

Video Tutorials

Contributing

We welcome contributions to this repository! If you have a resource that you believe should be included, please submit a pull request or open an issue. Contributions can include:

New libraries or tools related to mechanistic interpretability
Tutorials or guides that help users understand and implement mechanistic interpretability techniques
Research papers that advance the field of mechanistic interpretability
Any other resources that you find valuable for the community

How to Contribute

Fork the repository.
Create a new branch for your changes.
Make your changes and commit them with a clear message.
Push your changes to your forked repository.
Submit a pull request to the main repository.

Before contributing, take a look at the existing resources to avoid duplicates.

License

This repository is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the material, provided you give appropriate credit, link to the license, and indicate if changes were made.

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.github/workflows		.github/workflows
parser		parser
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Mechanistic Interpretability

Last Updated

Theorem

Papers (490)

Dedicated Publication Threads

Library

Tutorial

Written Tutorials

Video Tutorials

Contributing

How to Contribute

License

Star History

About

Uh oh!

Releases

Packages

Languages

License

moketchups/awesome-mechanistic-interpretability

Folders and files

Latest commit

History

Repository files navigation

Awesome Mechanistic Interpretability

Last Updated

Theorem

Papers (490)

Dedicated Publication Threads

Library

Tutorial

Written Tutorials

Video Tutorials

Contributing

How to Contribute

License

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages