Skip to content

Support torchao MPS 4-bit quantization#197

Merged
JacobSzwejbka merged 2 commits intohuggingface:mainfrom
manuelcandales:manuel/mps-int4-quant
Dec 17, 2025
Merged

Support torchao MPS 4-bit quantization#197
JacobSzwejbka merged 2 commits intohuggingface:mainfrom
manuelcandales:manuel/mps-int4-quant

Conversation

@manuelcandales
Copy link
Contributor

@manuelcandales manuelcandales commented Dec 16, 2025

This pull request extends the quantization and device support for the Executorch export pipeline:

  • Added fpa4w (floating point activation, 4-bit weight for MPS backend) as a valid choice for the --qlinear and --qlinear_encoder command-line arguments.
  • Added mps as a valid choice for the --device argument
  • Integrated the UIntxWeightOnlyConfig from torchao.experimental.quant_api.

@larryliu0820 larryliu0820 self-requested a review December 16, 2025 20:34
"--qlinear",
type=str,
choices=["8da4w", "4w", "8w", "8da8w", "8da4w,8da8w"],
choices=["8da4w", "4w", "8w", "8da8w", "8da4w,8da8w", "fpa4w"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does fpa4w work on backends other than metal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it only works with Metal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a check then, if a user pass --qlinear fpa4w and --device mps at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

)
if quant_config_key == "fpa4w":
# Need to import to load the ops
import torchao.experimental.ops.mps # noqa: F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit should we import this in torchao.experimental.quant_api so that from torchao.experimental.quant_api import UIntxWeightOnlyConfig can satisfy the import requirement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import torchao.experimental.ops.mps will raise an error if the op library isn't found. The metal ops are not built in torchao by default. For that reason, I thought it would be more clear to have an explicit import that loads the ops, rather than as a side effect of importing the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's also why I import torchao.experimental.ops.mps only if quant_config_key == "fpa4w"

@JacobSzwejbka JacobSzwejbka merged commit 96394e4 into huggingface:main Dec 17, 2025
43 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants