Support torchao MPS 4-bit quantization#197
Conversation
| "--qlinear", | ||
| type=str, | ||
| choices=["8da4w", "4w", "8w", "8da8w", "8da4w,8da8w"], | ||
| choices=["8da4w", "4w", "8w", "8da8w", "8da4w,8da8w", "fpa4w"], |
There was a problem hiding this comment.
Does fpa4w work on backends other than metal?
There was a problem hiding this comment.
No, it only works with Metal
There was a problem hiding this comment.
Can you add a check then, if a user pass --qlinear fpa4w and --device mps at the same time
| ) | ||
| if quant_config_key == "fpa4w": | ||
| # Need to import to load the ops | ||
| import torchao.experimental.ops.mps # noqa: F401 |
There was a problem hiding this comment.
nit should we import this in torchao.experimental.quant_api so that from torchao.experimental.quant_api import UIntxWeightOnlyConfig can satisfy the import requirement?
There was a problem hiding this comment.
import torchao.experimental.ops.mps will raise an error if the op library isn't found. The metal ops are not built in torchao by default. For that reason, I thought it would be more clear to have an explicit import that loads the ops, rather than as a side effect of importing the config.
There was a problem hiding this comment.
that's also why I import torchao.experimental.ops.mps only if quant_config_key == "fpa4w"
This pull request extends the quantization and device support for the Executorch export pipeline:
fpa4w(floating point activation, 4-bit weight for MPS backend) as a valid choice for the--qlinearand--qlinear_encodercommand-line arguments.mpsas a valid choice for the--deviceargumentUIntxWeightOnlyConfigfromtorchao.experimental.quant_api.