fix: enable MPS device support for macOS Apple Silicon#64
Open
Abeansits wants to merge 2 commits intoNVIDIA:mainfrom
Open
fix: enable MPS device support for macOS Apple Silicon#64Abeansits wants to merge 2 commits intoNVIDIA:mainfrom
Abeansits wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Uncomment existing MPS backend detection in torch_auto_device() and add "mps" to the DeviceString type. Change --device default from hardcoded "cuda" to None so auto-detection picks the best available backend (CUDA > MPS > CPU). The MPS code path in loaders.py (loading safetensors via CPU then moving to MPS) was already implemented but unreachable due to the server entrypoint having MPS commented out. Tested on M3 Pro (36 GB) with Python 3.12 / PyTorch 2.4.1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`aten::index_copy.out` is not implemented for the MPS device in PyTorch 2.4. Replace with equivalent slice assignment in the KV cache which is supported on all backends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
@rajarshiroy-nvidia check it out when you have time |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
torch_auto_device()and add"mps"to theDeviceStringtype--devicedefault from hardcoded"cuda"toNoneso auto-detection picks the best available backend (CUDA > MPS > CPU)index_copy_with MPS-compatible advanced indexing in the KV cache (aten::index_copy.outis not implemented for MPS in PyTorch 2.4)Motivation
Running
python -m moshi.serveron macOS crashes withAssertionError: Torch not compiled with CUDA enabledbecause--devicedefaults to"cuda". The MPS code path inloaders.py(loading safetensors via CPU then moving to MPS) was already implemented but unreachable.Performance note
Real-time audio inference with the 7B model is memory-bandwidth-bound and will not achieve real-time speeds on current Apple Silicon hardware (~150 GB/s vs X TB/s on datacenter GPUs). The server does run and produce output, but with noticeable latency. Quantization or a smaller model variant would be needed for real-time performance on MPS.
Update:

You can see that the drift of 45s is kinda brutal. 😅
Test plan