transformerlens

Star

Here are 12 public repositories matching this topic...

yash-srivastava19 / arrakis

Sponsor

Star

Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.

transformer garcon explainable-ai mechanistic-interpretability anthropic transformerlens

Updated Apr 22, 2025
Jupyter Notebook

FarnoushRJ / RelP

Star

[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching"

language-model circuit-analysis interpretability explainable-ai interpretable-machine-learning explainability llms mechanistic-interpretability transformerlens neurips-2025

Updated Nov 3, 2025
Python

krnel-ai / krnel-graph

Star

Lightweight representation engineering dataflow operations for agent developers.

transformers pytorch dataflow parquet huggingface huggingface-transformers duckdb pylance mechanistic-interpretability lancedb transformerlens representation-engineering pragmatic-interpretability

Updated Feb 23, 2026
Python

stchakwdev / Pinocchio-Vector-Test

Star

Investigating whether language models encode anticipated social consequences in their activations. Uses a 2x2 factorial design crossing truth × social valence to show that models are more sensitive to expected approval/disapproval than to truth itself.

language-models ai-safety interpretability deception-detection mechanistic-interpretability transformerlens

Updated Dec 18, 2025
Python

ashioyajotham / exploring_saes

Star

Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.

sparse-autoencoders interpretability activation-functions neuron-activity wandb transformerlens mech-interp

Updated Nov 21, 2025
Python

zilaeric / othello-gpt-probing

Star

Training and exploration of linear probes into Othello-GPT by Li et al. (2022)

probe othello gpt interpretability explainability transformerlens

Updated Jun 29, 2023
Jupyter Notebook

mduffster / self-referent-test

Star

Testing role-based pathways on small LLMs

research transformers pytorch ai-safety interpretability attention-mechanisms ai-alignment llm mechanistic-interpretability transformerlens

Updated Dec 11, 2025
Python

mduffster / epistemic_status

Star

Evaluating how a model 'knowing what it knows' changes from base to instruct

pytorch llm mechanistic-interpretability transformerlens

Updated Jan 21, 2026
Python

alexjackson1 / tx

Star

A Flax-based library for examining transformers, based on TransformerLens.

deep-learning transformers flax jax transformerlens

Updated Feb 11, 2024
Python

Alvoradozerouno / TransformerLens

Star

ORION-TransformerLens Consciousness — Mechanistic interpretability for consciousness research. Fork of TransformerLens (3,115+ stars). Finding consciousness correlates in attention heads.

orion iit consciousness mechanistic-interpretability transformerlens