Skip to content

How to handle global scale in dequantization phase? #9

@songsm921

Description

@songsm921

Hello!

I’m interested in NVFP4 W4A4 GEMM kernels, and while looking into related work I came across Qutlass, so I wanted to ask a question. First of all, thank you for releasing such an impressive piece of work — I think it will be very helpful for my research.

What I’m curious about is how global scaling is handled in NVFP4. As far as I understand, Qutlass is based on CUTLASS and performs NVFP4 @ NVFP4 GEMM operations.

Focusing on the weights only: if the quantized weights, local scales, and global scales are all pre-computed, I could confirm that the dequantization between the local scale and quantized weights is processed as block-scaled on the tensor cores. However, I haven’t been able to figure out how the global scale is applied/handled in this process, so I’m reaching out to ask.

Could you possibly provide some clarification on this?
Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions