fix: make bitsandbytes optional for CPU-only systems#101
fix: make bitsandbytes optional for CPU-only systems#101ricyoung wants to merge 1 commit intop-e-w:masterfrom
Conversation
Summary of ChangesHello @ricyoung, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the application's flexibility and compatibility by introducing explicit control over the compute device. Users can now specify whether to run on CPU, CUDA, or MPS, making the application more adaptable to various hardware configurations, including CPU-only systems. The changes also include robust handling of dependencies like Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable --device flag, enhancing flexibility by allowing explicit device selection and enabling CPU-only execution. The changes are well-structured, with conditional dependency management for bitsandbytes and clear configuration options. The logic for handling different devices, including graceful fallbacks and user warnings, is robust. I have one minor suggestion to simplify a piece of logic for improved readability, but overall, this is an excellent contribution.
src/heretic/model.py
Outdated
| if "float32" in dtypes_to_try: | ||
| dtypes_to_try = ["float32"] + [d for d in dtypes_to_try if d != "float32"] | ||
| else: | ||
| dtypes_to_try = ["float32"] + list(dtypes_to_try) |
There was a problem hiding this comment.
This logic for prioritizing float32 on CPU can be simplified. The if/else block is functionally correct but can be expressed more concisely with a single list comprehension that handles both cases (whether "float32" is already in the list or not).
dtypes_to_try = ["float32"] + [d for d in dtypes_to_try if d != "float32"]
Isn't this what already happens if you install the CPU version of PyTorch? |
|
Good question! You're right that CPU-only PyTorch works, but this PR addresses a few additional issues:
The main motivation was the original issue #12 where bitsandbytes caused installation/import failures on Intel Macs. Rather than just patching that specific case, this provides a more general solution. That said, if you think the simpler approach (just making bitsandbytes optional) is preferable, I'm happy to slim down the PR to just that change. |
src/heretic/model.py
Outdated
| dtypes_to_try = settings.dtypes | ||
| if settings.device == DeviceType.CPU: | ||
| # Put float32 first for CPU, as it's most compatible | ||
| dtypes_to_try = ["float32"] + [d for d in dtypes_to_try if d != "float32"] |
There was a problem hiding this comment.
I don't like this. There is a setting for dtypes, and the program should use it as specified. Not magically do something else in a certain configuration. If the user wants something else, they can configure that.
Note that many newer CPUs have native support for BF16, so this often does the wrong thing anyway.
There was a problem hiding this comment.
I guess - some CPU have support for BF16.
Intel Xeon: Support started with 3rd Generation Intel Xeon Scalable Processors (Cooper Lake) and continues in later generations, utilizing the AVX-512 BF16 extensions.
AMD Zen: Modern AMD processors based on the Zen architecture, including certain AMD EPYC (Zen 4) and Ryzen AI series, incorporate BF16 support.
Good point
config.default.toml
Outdated
| # "cpu" - Force CPU-only mode (slow but works anywhere) | ||
| # "cuda" - Force NVIDIA CUDA GPU | ||
| # "mps" - Force Apple Metal Performance Shaders (Apple Silicon) | ||
| device = "auto" |
There was a problem hiding this comment.
What exactly does this setting offer that the device_map setting doesn't?
You can set device_map to cpu already, and it will offload the entire model to CPU RAM. This is explicitly documented in Accelerate.
- Make bitsandbytes import conditional with graceful fallback - Exclude bitsandbytes dependency on Intel Macs where it can't work - Provide clear error message when quantization requested without bitsandbytes - Add documentation for CPU-only usage (device_map = "cpu") This allows Heretic to run on systems without CUDA support by using device_map = "cpu" and quantization = "none". Closes p-e-w#12
2ad1182 to
41617c0
Compare
|
|
||
| # Quantization method to use when loading the model. | ||
| # Options: "none" (no quantization), "bnb_4bit" (4-bit quantization using bitsandbytes). | ||
| # Note: 4-bit quantization requires bitsandbytes, which needs CUDA support. |
There was a problem hiding this comment.
That contradicts the bnb docs: https://huggingface.co/docs/transformers/en/quantization/bitsandbytes
bitsandbytes is supported on NVIDIA GPUs for CUDA versions 11.8 - 13.0, Intel XPU, Intel Gaudi (HPU), and CPU.
There are several mentions on that page of CPU-only usage.
There was a problem hiding this comment.
Note that ROCm-via-HIP is also (partially) supported: https://github.com/bitsandbytes-foundation/bitsandbytes#legend
But yeah it seems like CPUs should be supported as well, except for "8-bit Optimizers" (whatever that means).
Summary
Makes bitsandbytes an optional dependency so Heretic can run on CPU-only systems where bitsandbytes (which requires CUDA) is not available.
Changes
HAS_BITSANDBYTESflag)Usage
For CPU-only systems, users can set
device_map = "cpu"in their config (this was already supported by Accelerate).Test plan
Closes #12