How to handle global scale in dequantization phase?

Hello!

I’m interested in NVFP4 W4A4 GEMM kernels, and while looking into related work I came across Qutlass, so I wanted to ask a question. First of all, thank you for releasing such an impressive piece of work — I think it will be very helpful for my research.

What I’m curious about is how global scaling is handled in NVFP4. As far as I understand, Qutlass is based on CUTLASS and performs NVFP4 @ NVFP4 GEMM operations.

Focusing on the weights only: if the quantized weights, local scales, and global scales are all pre-computed, I could confirm that the dequantization between the local scale and quantized weights is processed as block-scaled on the tensor cores. However, I haven’t been able to figure out how the global scale is applied/handled in this process, so I’m reaching out to ask.

Could you possibly provide some clarification on this?
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to handle global scale in dequantization phase? #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to handle global scale in dequantization phase? #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions