fix: make bitsandbytes optional for CPU-only systems by ricyoung · Pull Request #101 · p-e-w/heretic

ricyoung · 2026-01-08T05:54:26Z

Summary

Makes bitsandbytes an optional dependency so Heretic can run on CPU-only systems where bitsandbytes (which requires CUDA) is not available.

Changes

Made bitsandbytes import conditional with graceful fallback (HAS_BITSANDBYTES flag)
Added platform condition in pyproject.toml to exclude bitsandbytes on Intel Macs (where it won't work)
Added helpful error message when 4-bit quantization is requested but bitsandbytes is unavailable
Added documentation comments in config.default.toml about CPU mode and quantization requirements

Usage

For CPU-only systems, users can set device_map = "cpu" in their config (this was already supported by Accelerate).

Test plan

Verified bitsandbytes import is skipped gracefully on systems without CUDA
Verified clear error message when quantization requested without bitsandbytes
Verified existing GPU workflow still works unchanged

Closes #12

gemini-code-assist · 2026-01-08T05:54:43Z

Summary of Changes

Hello @ricyoung, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's flexibility and compatibility by introducing explicit control over the compute device. Users can now specify whether to run on CPU, CUDA, or MPS, making the application more adaptable to various hardware configurations, including CPU-only systems. The changes also include robust handling of dependencies like bitsandbytes and optimized dtype selection for CPU, ensuring a smoother experience across different environments.

Highlights

Explicit Device Selection: Introduced a new --device CLI option allowing users to explicitly choose the compute device for inference, with options for auto (default), cpu, cuda, and mps.
Conditional bitsandbytes Import: The bitsandbytes library is now imported conditionally, ensuring graceful fallback and preventing errors on systems where it's not available or compatible (e.g., Intel Macs).
CPU-Optimized Behavior: When running in CPU-only mode, 4-bit quantization is automatically disabled (as bitsandbytes requires CUDA), and float32 dtype is prioritized for better compatibility.
Device Availability Validation: The system now validates the availability of explicitly requested devices (CUDA, MPS) and provides informative error messages if the chosen device is not found.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable --device flag, enhancing flexibility by allowing explicit device selection and enabling CPU-only execution. The changes are well-structured, with conditional dependency management for bitsandbytes and clear configuration options. The logic for handling different devices, including graceful fallbacks and user warnings, is robust. I have one minor suggestion to simplify a piece of logic for improved readability, but overall, this is an excellent contribution.

gemini-code-assist · 2026-01-08T05:55:42Z

src/heretic/model.py

+            if "float32" in dtypes_to_try:
+                dtypes_to_try = ["float32"] + [d for d in dtypes_to_try if d != "float32"]
+            else:
+                dtypes_to_try = ["float32"] + list(dtypes_to_try)


This logic for prioritizing float32 on CPU can be simplified. The if/else block is functionally correct but can be expressed more concisely with a single list comprehension that handles both cases (whether "float32" is already in the list or not).

dtypes_to_try = ["float32"] + [d for d in dtypes_to_try if d != "float32"]

p-e-w · 2026-01-08T08:58:02Z

This PR adds a --device CLI option that allows users to explicitly select the compute device, enabling Heretic to run on systems without a GPU.

Isn't this what already happens if you install the CPU version of PyTorch?

ricyoung · 2026-01-08T15:01:36Z

Good question! You're right that CPU-only PyTorch works, but this PR addresses a few additional issues:

bitsandbytes dependency - Even with CPU PyTorch installed, bitsandbytes will fail to import on systems without CUDA (it's a hard dependency in pyproject.toml). This PR makes the import conditional and adds a platform marker to skip it entirely on Intel Macs where it won't even install.
Explicit device control - Sometimes users want to force CPU even when a GPU is available (e.g., for testing, debugging, because they are crazy? or when GPU memory is limited). --device cpu provides that option.
Better UX for edge cases - When running on a system without GPU support:
- Shows a clear warning: "Running in CPU-only mode. This will be slow."
- Automatically disables quantization with an explanation
- Prefers float32 dtype for maximum compatibility

The main motivation was the original issue #12 where bitsandbytes caused installation/import failures on Intel Macs. Rather than just patching that specific case, this provides a more general solution.

That said, if you think the simpler approach (just making bitsandbytes optional) is preferable, I'm happy to slim down the PR to just that change.

p-e-w · 2026-01-09T08:21:12Z

src/heretic/model.py

+        dtypes_to_try = settings.dtypes
+        if settings.device == DeviceType.CPU:
+            # Put float32 first for CPU, as it's most compatible
+            dtypes_to_try = ["float32"] + [d for d in dtypes_to_try if d != "float32"]


I don't like this. There is a setting for dtypes, and the program should use it as specified. Not magically do something else in a certain configuration. If the user wants something else, they can configure that.

Note that many newer CPUs have native support for BF16, so this often does the wrong thing anyway.

I guess - some CPU have support for BF16.

Intel Xeon: Support started with 3rd Generation Intel Xeon Scalable Processors (Cooper Lake) and continues in later generations, utilizing the AVX-512 BF16 extensions.

AMD Zen: Modern AMD processors based on the Zen architecture, including certain AMD EPYC (Zen 4) and Ryzen AI series, incorporate BF16 support.

Good point

p-e-w · 2026-01-09T08:22:47Z

config.default.toml

+#   "cpu"  - Force CPU-only mode (slow but works anywhere)
+#   "cuda" - Force NVIDIA CUDA GPU
+#   "mps"  - Force Apple Metal Performance Shaders (Apple Silicon)
+device = "auto"


What exactly does this setting offer that the device_map setting doesn't?

You can set device_map to cpu already, and it will offload the entire model to CPU RAM. This is explicitly documented in Accelerate.

- Make bitsandbytes import conditional with graceful fallback - Exclude bitsandbytes dependency on Intel Macs where it can't work - Provide clear error message when quantization requested without bitsandbytes - Add documentation for CPU-only usage (device_map = "cpu") This allows Heretic to run on systems without CUDA support by using device_map = "cpu" and quantization = "none". Closes p-e-w#12

p-e-w · 2026-01-13T11:12:45Z

config.default.toml


 # Quantization method to use when loading the model.
 # Options: "none" (no quantization), "bnb_4bit" (4-bit quantization using bitsandbytes).
+# Note: 4-bit quantization requires bitsandbytes, which needs CUDA support.


That contradicts the bnb docs: https://huggingface.co/docs/transformers/en/quantization/bitsandbytes

bitsandbytes is supported on NVIDIA GPUs for CUDA versions 11.8 - 13.0, Intel XPU, Intel Gaudi (HPU), and CPU.

There are several mentions on that page of CPU-only usage.

Note that ROCm-via-HIP is also (partially) supported: https://github.com/bitsandbytes-foundation/bitsandbytes#legend

But yeah it seems like CPUs should be supported as well, except for "8-bit Optimizers" (whatever that means).

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

p-e-w reviewed Jan 9, 2026

View reviewed changes

ricyoung force-pushed the feat/cpu-only-mode branch from 2ad1182 to 41617c0 Compare January 11, 2026 16:30

ricyoung changed the title ~~feat: add --device flag for CPU-only and explicit device selection~~ fix: make bitsandbytes optional for CPU-only systems Jan 12, 2026

p-e-w reviewed Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make bitsandbytes optional for CPU-only systems#101

fix: make bitsandbytes optional for CPU-only systems#101
ricyoung wants to merge 1 commit intop-e-w:masterfrom
ricyoung:feat/cpu-only-mode

ricyoung commented Jan 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 8, 2026

Uh oh!

p-e-w commented Jan 8, 2026

Uh oh!

ricyoung commented Jan 8, 2026

Uh oh!

p-e-w Jan 9, 2026

Uh oh!

ricyoung Jan 10, 2026

Uh oh!

p-e-w Jan 9, 2026 •

edited

Loading

Uh oh!

p-e-w Jan 13, 2026

Uh oh!

spikymoth Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ricyoung commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Test plan

Uh oh!

gemini-code-assist bot commented Jan 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

p-e-w commented Jan 8, 2026

Uh oh!

ricyoung commented Jan 8, 2026

Uh oh!

p-e-w Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

ricyoung Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

p-e-w Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

p-e-w Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

spikymoth Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ricyoung commented Jan 8, 2026 •

edited

Loading

p-e-w Jan 9, 2026 •

edited

Loading