Skip to content

fix: enable MPS device support for macOS Apple Silicon#64

Open
Abeansits wants to merge 2 commits intoNVIDIA:mainfrom
Abeansits:fix/enable-mps-device-support
Open

fix: enable MPS device support for macOS Apple Silicon#64
Abeansits wants to merge 2 commits intoNVIDIA:mainfrom
Abeansits:fix/enable-mps-device-support

Conversation

@Abeansits
Copy link

@Abeansits Abeansits commented Feb 23, 2026

Summary

  • Uncomment existing MPS backend detection in torch_auto_device() and add "mps" to the DeviceString type
  • Change --device default from hardcoded "cuda" to None so auto-detection picks the best available backend (CUDA > MPS > CPU)
  • Replace index_copy_ with MPS-compatible advanced indexing in the KV cache (aten::index_copy.out is not implemented for MPS in PyTorch 2.4)

Motivation

Running python -m moshi.server on macOS crashes with AssertionError: Torch not compiled with CUDA enabled because --device defaults to "cuda". The MPS code path in loaders.py (loading safetensors via CPU then moving to MPS) was already implemented but unreachable.

Performance note

Real-time audio inference with the 7B model is memory-bandwidth-bound and will not achieve real-time speeds on current Apple Silicon hardware (~150 GB/s vs X TB/s on datacenter GPUs). The server does run and produce output, but with noticeable latency. Quantization or a smaller model variant would be needed for real-time performance on MPS.

Update:
You can see that the drift of 45s is kinda brutal. 😅
Screenshot 2026-02-23 at 15 45 02

Test plan

  • Server starts and auto-detects MPS device on M3 Pro (36 GB)
  • Model loads and runs inference (warmup completes without errors)
  • Audio output is produced (functional, though not real-time)
  • No changes to CUDA or CPU code paths

Abeansits and others added 2 commits February 23, 2026 14:36
Uncomment existing MPS backend detection in torch_auto_device() and add
"mps" to the DeviceString type. Change --device default from hardcoded
"cuda" to None so auto-detection picks the best available backend
(CUDA > MPS > CPU).

The MPS code path in loaders.py (loading safetensors via CPU then moving
to MPS) was already implemented but unreachable due to the server
entrypoint having MPS commented out.

Tested on M3 Pro (36 GB) with Python 3.12 / PyTorch 2.4.1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`aten::index_copy.out` is not implemented for the MPS device in
PyTorch 2.4. Replace with equivalent slice assignment in the KV cache
which is supported on all backends.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Abeansits
Copy link
Author

@rajarshiroy-nvidia check it out when you have time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant