-
Notifications
You must be signed in to change notification settings - Fork 52
Description
in your matmul-pingpong-v1.cu,the code:
/// make shared memory descriptor template <class PointerType> DEVICE GmmaDescriptor make_smem_desc(PointerType smem_ptr) { GmmaDescriptor desc; uint32_t uint_ptr = static_cast<uint32_t>(__cvta_generic_to_shared(smem_ptr)); desc.bitfield.start_address_ = uint_ptr >> 4; desc.bitfield.layout_type_ = 0x1; /// swizzle 128B because we use Swizzle<3,4,3> desc.bitfield.leading_byte_offset_ = 0x1; /// no use desc.bitfield.stride_byte_offset_ = 64; /// how many 128bits-rows needed between two core matrices desc.bitfield.base_offset_ = 0x0; return desc; }
I don't know why leading_byte_offset_ = 0x1,stride_byte_offset_ = 64,from nvdia ptx docs,I know Leading dimension byte offset of matrix A or B is the distance, in bytes, between two adjacent core matrices in the K dimension.
Stride dimension byte offset of matrix A or B is the distance, in bytes, between two adjacent core matrices in the M or N dimension.
but I don't understand they are 0x1,64 in swizzle 128.maybe I fail to understand Leading dimension byte offset and Stride dimension byte offset,can you give me an answer, please.