fix: enable MPS device support for macOS Apple Silicon by Abeansits · Pull Request #64 · NVIDIA/personaplex

Abeansits · 2026-02-23T23:59:27Z

Summary

Uncomment existing MPS backend detection in torch_auto_device() and add "mps" to the DeviceString type
Change --device default from hardcoded "cuda" to None so auto-detection picks the best available backend (CUDA > MPS > CPU)
Replace index_copy_ with MPS-compatible advanced indexing in the KV cache (aten::index_copy.out is not implemented for MPS in PyTorch 2.4)

Motivation

Running python -m moshi.server on macOS crashes with AssertionError: Torch not compiled with CUDA enabled because --device defaults to "cuda". The MPS code path in loaders.py (loading safetensors via CPU then moving to MPS) was already implemented but unreachable.

Performance note

Real-time audio inference with the 7B model is memory-bandwidth-bound and will not achieve real-time speeds on current Apple Silicon hardware (~150 GB/s vs X TB/s on datacenter GPUs). The server does run and produce output, but with noticeable latency. Quantization or a smaller model variant would be needed for real-time performance on MPS.

Update:
You can see that the drift of 45s is kinda brutal. 😅

Test plan

Server starts and auto-detects MPS device on M3 Pro (36 GB)
Model loads and runs inference (warmup completes without errors)
Audio output is produced (functional, though not real-time)
No changes to CUDA or CPU code paths

Uncomment existing MPS backend detection in torch_auto_device() and add "mps" to the DeviceString type. Change --device default from hardcoded "cuda" to None so auto-detection picks the best available backend (CUDA > MPS > CPU). The MPS code path in loaders.py (loading safetensors via CPU then moving to MPS) was already implemented but unreachable due to the server entrypoint having MPS commented out. Tested on M3 Pro (36 GB) with Python 3.12 / PyTorch 2.4.1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

`aten::index_copy.out` is not implemented for the MPS device in PyTorch 2.4. Replace with equivalent slice assignment in the KV cache which is supported on all backends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Abeansits · 2026-02-24T00:17:59Z

@rajarshiroy-nvidia check it out when you have time

Abeansits and others added 2 commits February 23, 2026 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enable MPS device support for macOS Apple Silicon#64

fix: enable MPS device support for macOS Apple Silicon#64
Abeansits wants to merge 2 commits intoNVIDIA:mainfrom
Abeansits:fix/enable-mps-device-support

Abeansits commented Feb 23, 2026 •

edited

Loading

Uh oh!

Abeansits commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Abeansits commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Performance note

Test plan

Uh oh!

Abeansits commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Abeansits commented Feb 23, 2026 •

edited

Loading