Batmobile

High-performance CUDA kernels for equivariant graph neural networks. Batmobile provides optimized implementations of spherical harmonics, tensor products with Clebsch-Gordan coefficients, and fused message passing operations - the computational bottlenecks in models like MACE, NequIP, and Allegro. Built for L_max=3, targeting molecular dynamics and materials science workloads.

Installation

Requires CUDA toolkit (tested with CUDA 12.x) and PyTorch 2.0+.

pip install .

For development:

pip install -e ".[dev]"

Benchmarks

Measured on RTX 3090, N_atoms=1000, C=32, ~20 neighbors/atom:

Operation	e3nn (baseline)	Batmobile	Speedup
Spherical Harmonics (L=3)	0.142 ms	0.012 ms	11.8x
Tensor Product	1.847 ms	0.089 ms	20.8x
TP Backward	3.21 ms	0.156 ms	20.6x
Fused SH+TP	0.574 ms	0.413 ms	1.39x

Full benchmark at scale (N_atoms=5000, C=64, ~30 neighbors/atom):

Pipeline	Time	Speedup
Unfused (SH + TP + scatter)	8.604 ms	-
Fused (fused_sh_tp + scatter)	5.935 ms	1.45x

Usage

import torch
import batmobile

# Spherical harmonics
edge_vectors = torch.randn(1000, 3, device="cuda")
edge_vectors = edge_vectors / edge_vectors.norm(dim=1, keepdim=True)
Y_lm = batmobile.spherical_harmonics(edge_vectors, L_max=3)  # [1000, 16]

# Tensor product (simple, no weights)
node_feats = torch.randn(1000, 16, device="cuda")
output = batmobile.tensor_product_simple(node_feats, Y_lm)  # [1000, 16]

# Tensor product with channels and weights
node_feats = torch.randn(1000, 32, 16, device="cuda")  # [N, C_in, 16]
weights = torch.randn(34, 32, 64, device="cuda")  # [num_paths, C_in, C_out]
output = batmobile.tensor_product(node_feats, Y_lm, weights)  # [N, C_out, 16]

# Fused SH + TP (eliminates Y_lm from global memory)
source_idx = torch.randint(0, 100, (1000,), device="cuda")
messages = batmobile.fused_sh_tp_simple(edge_vectors, node_feats, source_idx)

Autograd Support

All operations support PyTorch autograd:

from batmobile.autograd import SphericalHarmonics, TensorProduct

# With autograd
edge_vectors.requires_grad = True
Y_lm = SphericalHarmonics.apply(edge_vectors, 3)
loss = Y_lm.sum()
loss.backward()  # Computes grad w.r.t. edge_vectors

Reproducing Benchmarks

# Spherical harmonics
python benchmarks/bench_spherical_harmonics.py

# Tensor product
python benchmarks/benchmark_tensor_product.py

# Fused SH+TP
python benchmarks/benchmark_fused_sh_tp.py

# End-to-end MACE layer
python benchmarks/benchmark_e2e_mace.py

API Reference

Spherical Harmonics

spherical_harmonics(edge_vectors, L_max) - Compute Y_lm for unit vectors
spherical_harmonics_backward(edge_vectors, grad_Y_lm) - Manual backward

Tensor Product

tensor_product_simple(input1, input2) - Pure CG contraction, no weights
tensor_product(input1, input2, weights) - With channels and learnable weights
get_tp_num_paths() - Returns 34 (number of CG paths for L_max=3)
get_tp_path_info() - Returns [34, 3] array of (l1, l2, l_out) per path

Fused Operations

fused_sh_tp_simple(edge_vectors, node_features, source_idx) - Fused SH+TP

Architecture

batmobile/
├── include/           # CUDA headers with inline kernels
├── src/
│   ├── spherical_harmonics/
│   ├── tensor_product/
│   └── message_passing/
├── python/batmobile/  # Python package
│   ├── __init__.py    # Public API
│   └── autograd.py    # torch.autograd.Function wrappers
├── benchmarks/
├── tests/
└── examples/

License

MIT License. See LICENSE.

For AI Assistants

See LLMS.txt for a structured overview of this codebase optimized for LLMs.

Citation

If you use Batmobile in your research, please cite:

@software{batmobile2025,
  title={Batmobile: High-Performance CUDA Kernels for Equivariant GNNs},
  author={Elliot Arledge},
  year={2025},
  url={https://github.com/Infatoshi/batmobile}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmarks		benchmarks
examples		examples
include		include
python		python
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LLMS.txt		LLMS.txt
README.md		README.md
benchmark_chart.png		benchmark_chart.png
benchmark_chart_white.png		benchmark_chart_white.png
benchmark_hero.png		benchmark_hero.png
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Batmobile

Installation

Benchmarks

Usage

Autograd Support

Reproducing Benchmarks

API Reference

Spherical Harmonics

Tensor Product

Fused Operations

Architecture

License

For AI Assistants

Citation

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

Infatoshi/batmobile

Folders and files

Latest commit

History

Repository files navigation

Batmobile

Installation

Benchmarks

Usage

Autograd Support

Reproducing Benchmarks

API Reference

Spherical Harmonics

Tensor Product

Fused Operations

Architecture

License

For AI Assistants

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages