Conversation
|
Did you get it working? CPU inference works just fine with the current version on my system. |
|
Yes, I was being dumb and used the default 3.12 python from uv. Setting it to 3.11 worked just fine. So I closed. Thank you! |
|
Actually, I am reopening this with a few other provisions for the case of older computers like mine (Mac-Pro, 2019, Intel x64, no GPU (maybe integrated graphics which shows up as MPS, e.g., Intel UHD 630 )). Was having compatibility issues (need PyTorch >2.1, <2.3; NumPy <2.0), along with this specific version not supporting BFloat. I hope this helps and doesn't cause issues with anyone else. My initial problem was just getting it installed, which the initial PR was for; this PR is for actual runtime use. |
|
I don't get it. Why should Python 3.12 fail? I use it all the time, and it works fine with CPU inference. |
|
You are correct, I have reverted the versioning for python; the original problem was with torch and numpy versions (which were resolved) |
pyproject.toml
Outdated
| { name = "Philipp Emanuel Weidmann", email = "pew@worldwidemann.com" } | ||
| ] | ||
| requires-python = ">=3.10" | ||
| requires-python = ">=3.10,<3.13" # Supports 3.10-3.12 (verified with Python 3.12) |
There was a problem hiding this comment.
I did not verify, 3.12 was the highest I verified.
pyproject.toml
Outdated
| # macOS Intel (x86_64): Use >=2.1.0,<2.3.0 (2.3+ dropped macOS Intel support; last with wheels was 2.2.0) | ||
| "torch>=2.1.0,<2.3.0; platform_machine == 'x86_64' and sys_platform == 'darwin'", | ||
| # macOS Silicon (ARM64): Can use newer versions with Python 3.12 | ||
| "torch>=2.0.0; platform_machine == 'arm64' and sys_platform == 'darwin'", |
There was a problem hiding this comment.
Version >= 2.2 is required. 2.0 fails (don't quite remember what the problem was, some multiplication kernel issue I think).
There was a problem hiding this comment.
Okay by default 2.2.2 is installed, we can set this param also
| # Other platforms (Linux, Windows): Can use newer versions | ||
| "torch>=2.2.0; sys_platform != 'darwin'", | ||
| # NumPy: PyTorch 2.1-2.2 (macOS Intel) require NumPy 1.x (not 2.x) | ||
| "numpy<2.0; platform_machine == 'x86_64' and sys_platform == 'darwin'", |
There was a problem hiding this comment.
Do these dependency specs automatically install the right version on all Mac platforms (both Intel and M series)?
I have no experience with Macs so I'm a little out of my depth here.
There was a problem hiding this comment.
That should be incidental, doesn't matter as they are all pointing to the same version
# macOS Silicon (ARM64): Can use newer versions with Python 3.12
"torch>=2.2.0; platform_machine == 'arm64' and sys_platform == 'darwin'",
# Other platforms (Linux, Windows): Can use newer versions
"torch>=2.2.0; sys_platform != 'darwin'",
| for dtype in settings.dtypes: | ||
| print(f"* Trying dtype [bold]{dtype}[/]... ", end="") | ||
| # Filter dtypes: MPS doesn't support bfloat16, so skip "auto" on MPS | ||
| # (since "auto" typically resolves to bfloat16) |
There was a problem hiding this comment.
Doesn't that happen automatically when the MPS backend is used? It would be strange if a backend tried to load a format it doesn't support.
There was a problem hiding this comment.
What happens if you use auto on MPS?
There was a problem hiding this comment.
It did not for mine. I believe it is because it technically is using MPS but it is more of an integrated graphics (Intel UHD 630) instead of an actual Metal backend. On any Apple Silicon Mac I do not believe that error would propagate however I do not have an M series chip.
| dtypes_to_try = [dtype for dtype in dtypes_to_try if dtype != "auto"] | ||
| if not dtypes_to_try: | ||
| # If only "auto" was specified, default to float16 for MPS | ||
| dtypes_to_try = ["float16"] |
There was a problem hiding this comment.
That's a lot of magic, and might make it difficult for the user to understand why problems happen if they explicitly specified a dtype cascade and the program just does something else.
There was a problem hiding this comment.
Fair points. I made changes so this should only affect x86_64 CPU trying to use MPS, so that it only affects a small subset of users using a non-gpu Mac
src/heretic/model.py
Outdated
| else: | ||
| self.model = AutoModelForCausalLM.from_pretrained( | ||
| settings.model, | ||
| torch_dtype=torch_dtype, |
There was a problem hiding this comment.
This triggers a deprecation warning with newer Transformers versions. The argument name is just dtype now.
src/heretic/model.py
Outdated
| dtype=dtype, | ||
| device_map=settings.device_map, | ||
| ) | ||
| # Convert dtype string to torch dtype object |
There was a problem hiding this comment.
Why is this necessary? I'm pretty sure torch accepts both strings and objects as arguments, and performs the conversion automatically.
src/heretic/model.py
Outdated
| # Then convert dtype explicitly, then move to MPS | ||
| self.model = AutoModelForCausalLM.from_pretrained( | ||
| settings.model, | ||
| torch_dtype=None, # Load as-is first |
There was a problem hiding this comment.
Argument name is still wrong. torch_dtype => dtype, otherwise a warning is raised on recent Transformer versions. Same thing below.
| dtypes_to_try = ["float16"] | ||
|
|
||
| for dtype in dtypes_to_try: | ||
| print(f"* Trying dtype [bold]{dtype_str}[/]... ", end="") |
There was a problem hiding this comment.
| print(f"* Trying dtype [bold]{dtype_str}[/]... ", end="") | |
| print(f"* Trying dtype [bold]{dtype}[/]... ", end="") |
This will raise an error otherwise.
| if use_mps and is_intel_x86_64: | ||
| # Load to CPU first without dtype to avoid bfloat16 preservation | ||
| # Then convert dtype explicitly, then move to MPS | ||
| self.model = AutoModelForCausalLM.from_pretrained( |
There was a problem hiding this comment.
The same thing needs to happen in reload_model below, which is called on every trial.
| device_map="cpu", # Load to CPU first | ||
| low_cpu_mem_usage=False, # Ensure full conversion | ||
| ) | ||
| # Convert to desired dtype explicitly (this forces conversion) |
There was a problem hiding this comment.
What exactly is the difference between this and just specifying the dtype directly when loading the model?
|
Please rebase on top of master to run CI. |
Add a new --device CLI option that allows users to explicitly select the compute device (auto, cpu, cuda, mps). This enables running Heretic on systems without a GPU. Changes: - Add DeviceType enum with AUTO, CPU, CUDA, MPS options - Add --device setting that derives device_map automatically - Make bitsandbytes import conditional with graceful fallback - Exclude bitsandbytes from Intel Macs in pyproject.toml - Auto-disable quantization on CPU (requires CUDA) - Prefer float32 dtype on CPU for best compatibility - Add validation for explicit CUDA/MPS device requests Closes p-e-w#12
- Make bitsandbytes import conditional with graceful fallback - Exclude bitsandbytes dependency on Intel Macs where it can't work - Provide clear error message when quantization requested without bitsandbytes - Add documentation for CPU-only usage (device_map = "cpu") This allows Heretic to run on systems without CUDA support by using device_map = "cpu" and quantization = "none". Closes p-e-w#12
I have an older mac running intel silicon that was not working; slight pyproject.toml changes to allow for proper older versions of software to allow for non-GPU use cases (for testing, mainly)