-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hello!
I’m interested in NVFP4 W4A4 GEMM kernels, and while looking into related work I came across Qutlass, so I wanted to ask a question. First of all, thank you for releasing such an impressive piece of work — I think it will be very helpful for my research.
What I’m curious about is how global scaling is handled in NVFP4. As far as I understand, Qutlass is based on CUTLASS and performs NVFP4 @ NVFP4 GEMM operations.
Focusing on the weights only: if the quantized weights, local scales, and global scales are all pre-computed, I could confirm that the dequantization between the local scale and quantized weights is processed as block-scaled on the tensor cores. However, I haven’t been able to figure out how the global scale is applied/handled in this process, so I’m reaching out to ask.
Could you possibly provide some clarification on this?
Thank you!