[MobiSys 2026] On-device parallel ultra-low-bit (ternary) LLM inference with LUT-based mpGeMM kernel.
-
Updated
Jan 1, 2026 - C++
[MobiSys 2026] On-device parallel ultra-low-bit (ternary) LLM inference with LUT-based mpGeMM kernel.
experimenting with binary brain(C++ framework for training Binary Neural Networks)
Deploying logic gate networks on FPGAs [Implementation of Differentiable Logic Gate Networks (NeurIPS 2024)]
Add a description, image, and links to the low-bit-quantization topic page so that developers can more easily learn about it.
To associate your repository with the low-bit-quantization topic, visit your repo's landing page and select "manage topics."