Skip to content

Ability to run on CPU only (no GPU)#12

Open
noahvandal wants to merge 12 commits intop-e-w:masterfrom
noahvandal:master
Open

Ability to run on CPU only (no GPU)#12
noahvandal wants to merge 12 commits intop-e-w:masterfrom
noahvandal:master

Conversation

@noahvandal
Copy link

I have an older mac running intel silicon that was not working; slight pyproject.toml changes to allow for proper older versions of software to allow for non-GPU use cases (for testing, mainly)

@noahvandal noahvandal closed this Nov 17, 2025
@p-e-w
Copy link
Owner

p-e-w commented Nov 17, 2025

Did you get it working? CPU inference works just fine with the current version on my system.

@noahvandal
Copy link
Author

Yes, I was being dumb and used the default 3.12 python from uv. Setting it to 3.11 worked just fine. So I closed. Thank you!

@noahvandal noahvandal reopened this Nov 17, 2025
@noahvandal
Copy link
Author

Actually, I am reopening this with a few other provisions for the case of older computers like mine (Mac-Pro, 2019, Intel x64, no GPU (maybe integrated graphics which shows up as MPS, e.g., Intel UHD 630 )).

Was having compatibility issues (need PyTorch >2.1, <2.3; NumPy <2.0), along with this specific version not supporting BFloat.

I hope this helps and doesn't cause issues with anyone else. My initial problem was just getting it installed, which the initial PR was for; this PR is for actual runtime use.

@p-e-w
Copy link
Owner

p-e-w commented Nov 17, 2025

I don't get it. Why should Python 3.12 fail? I use it all the time, and it works fine with CPU inference.

@noahvandal
Copy link
Author

You are correct, I have reverted the versioning for python; the original problem was with torch and numpy versions (which were resolved)

pyproject.toml Outdated
{ name = "Philipp Emanuel Weidmann", email = "pew@worldwidemann.com" }
]
requires-python = ">=3.10"
requires-python = ">=3.10,<3.13" # Supports 3.10-3.12 (verified with Python 3.12)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 3.13 fail?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not verify, 3.12 was the highest I verified.

pyproject.toml Outdated
# macOS Intel (x86_64): Use >=2.1.0,<2.3.0 (2.3+ dropped macOS Intel support; last with wheels was 2.2.0)
"torch>=2.1.0,<2.3.0; platform_machine == 'x86_64' and sys_platform == 'darwin'",
# macOS Silicon (ARM64): Can use newer versions with Python 3.12
"torch>=2.0.0; platform_machine == 'arm64' and sys_platform == 'darwin'",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version >= 2.2 is required. 2.0 fails (don't quite remember what the problem was, some multiplication kernel issue I think).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay by default 2.2.2 is installed, we can set this param also

# Other platforms (Linux, Windows): Can use newer versions
"torch>=2.2.0; sys_platform != 'darwin'",
# NumPy: PyTorch 2.1-2.2 (macOS Intel) require NumPy 1.x (not 2.x)
"numpy<2.0; platform_machine == 'x86_64' and sys_platform == 'darwin'",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these dependency specs automatically install the right version on all Mac platforms (both Intel and M series)?

I have no experience with Macs so I'm a little out of my depth here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be incidental, doesn't matter as they are all pointing to the same version

# macOS Silicon (ARM64): Can use newer versions with Python 3.12
"torch>=2.2.0; platform_machine == 'arm64' and sys_platform == 'darwin'",
# Other platforms (Linux, Windows): Can use newer versions
"torch>=2.2.0; sys_platform != 'darwin'",

for dtype in settings.dtypes:
print(f"* Trying dtype [bold]{dtype}[/]... ", end="")
# Filter dtypes: MPS doesn't support bfloat16, so skip "auto" on MPS
# (since "auto" typically resolves to bfloat16)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that happen automatically when the MPS backend is used? It would be strange if a backend tried to load a format it doesn't support.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you use auto on MPS?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did not for mine. I believe it is because it technically is using MPS but it is more of an integrated graphics (Intel UHD 630) instead of an actual Metal backend. On any Apple Silicon Mac I do not believe that error would propagate however I do not have an M series chip.

dtypes_to_try = [dtype for dtype in dtypes_to_try if dtype != "auto"]
if not dtypes_to_try:
# If only "auto" was specified, default to float16 for MPS
dtypes_to_try = ["float16"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of magic, and might make it difficult for the user to understand why problems happen if they explicitly specified a dtype cascade and the program just does something else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair points. I made changes so this should only affect x86_64 CPU trying to use MPS, so that it only affects a small subset of users using a non-gpu Mac

else:
self.model = AutoModelForCausalLM.from_pretrained(
settings.model,
torch_dtype=torch_dtype,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triggers a deprecation warning with newer Transformers versions. The argument name is just dtype now.

dtype=dtype,
device_map=settings.device_map,
)
# Convert dtype string to torch dtype object
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? I'm pretty sure torch accepts both strings and objects as arguments, and performs the conversion automatically.

# Then convert dtype explicitly, then move to MPS
self.model = AutoModelForCausalLM.from_pretrained(
settings.model,
torch_dtype=None, # Load as-is first
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument name is still wrong. torch_dtype => dtype, otherwise a warning is raised on recent Transformer versions. Same thing below.

dtypes_to_try = ["float16"]

for dtype in dtypes_to_try:
print(f"* Trying dtype [bold]{dtype_str}[/]... ", end="")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"* Trying dtype [bold]{dtype_str}[/]... ", end="")
print(f"* Trying dtype [bold]{dtype}[/]... ", end="")

This will raise an error otherwise.

if use_mps and is_intel_x86_64:
# Load to CPU first without dtype to avoid bfloat16 preservation
# Then convert dtype explicitly, then move to MPS
self.model = AutoModelForCausalLM.from_pretrained(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing needs to happen in reload_model below, which is called on every trial.

device_map="cpu", # Load to CPU first
low_cpu_mem_usage=False, # Ensure full conversion
)
# Convert to desired dtype explicitly (this forces conversion)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the difference between this and just specifying the dtype directly when loading the model?

@p-e-w
Copy link
Owner

p-e-w commented Nov 19, 2025

Please rebase on top of master to run CI.

ricyoung added a commit to ricyoung/heretic that referenced this pull request Jan 8, 2026
Add a new --device CLI option that allows users to explicitly select
the compute device (auto, cpu, cuda, mps). This enables running Heretic
on systems without a GPU.

Changes:
- Add DeviceType enum with AUTO, CPU, CUDA, MPS options
- Add --device setting that derives device_map automatically
- Make bitsandbytes import conditional with graceful fallback
- Exclude bitsandbytes from Intel Macs in pyproject.toml
- Auto-disable quantization on CPU (requires CUDA)
- Prefer float32 dtype on CPU for best compatibility
- Add validation for explicit CUDA/MPS device requests

Closes p-e-w#12
ricyoung added a commit to ricyoung/heretic that referenced this pull request Jan 11, 2026
- Make bitsandbytes import conditional with graceful fallback
- Exclude bitsandbytes dependency on Intel Macs where it can't work
- Provide clear error message when quantization requested without bitsandbytes
- Add documentation for CPU-only usage (device_map = "cpu")

This allows Heretic to run on systems without CUDA support by using
device_map = "cpu" and quantization = "none".

Closes p-e-w#12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants