[quantization] Introduce wrapper for Qwen3VLVisionPatchEmbed#488
[quantization] Introduce wrapper for Qwen3VLVisionPatchEmbed#488mhs4670go merged 1 commit intoSamsung:mainfrom
Conversation
13abd1a to
0aac4e4
Compare
| @@ -0,0 +1,236 @@ | |||
| # Copyright (c) 2025 Samsung Electronics Co., Ltd. All Rights Reserved | |||
There was a problem hiding this comment.
| # Copyright (c) 2025 Samsung Electronics Co., Ltd. All Rights Reserved | |
| # Copyright (c) 2026 Samsung Electronics Co., Ltd. All Rights Reserved |
| cfg = Qwen3VLVisionConfig( | ||
| hidden_size=1024, # Match Qwen3-VL's hidden size | ||
| spatial_merge_size=2, | ||
| temporal_merge_size=2, | ||
| ) | ||
| model = Qwen3VLVisionPatchEmbed(cfg) |
There was a problem hiding this comment.
(Just to note) Oh...? This model looks a bit different from my vision patch embed. Maybe because spacial_merge_size ..
There was a problem hiding this comment.
@mhs4670go I cannot attach image files here, see here
There was a problem hiding this comment.
As below are our target configuration for this layer, could you use this?
Qwen3VLVisionPatchEmbed(
(proj): Conv3d(3, 1024, kernel_size=(2, 16, 16), stride=(2, 16, 16))
)
'args': ('Tensor(shape=[468, 1536], dtype=torch.float32)',)The reason why is that, your current example generates some float32 ADD operator remains. (See #489 for details)
We are planning to lower above specific Conv3d operator into Conv2d+Reshape (@llFreetimell is working on it). Above specifics are derived from a use case scenario (which is not 100% fixed for now, though).
Thus, it would be good to provide quantization example with above version.
(+ Do you have any specific reason to decide your configuration of this Qwen3VLVisionPatchEmbed?)
There was a problem hiding this comment.
As below are our target configuration for this layer, could you use this?
@dayo09 👍 Thanks for noticing this! I've changed the code of example and added assertions checking that Conv3d has the right configuration.
There was a problem hiding this comment.
@dvsav Well, after applying the config, the graph remains the same. (I am sorry that I cannot show you the image. I am not yet permitted to upload image, I will process that soon to alleviate your inconvenience)
Convolution's weight is lifted up as constant input and not constant-folded. I believe constant folding after quantization is required in this case. 😅
This change introduces QuantQwen3VLVisionPatchEmbed wrapper to support post-training quantization of Qwen3VLVisionPatchEmbed module. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
0aac4e4 to
20d536b
Compare
This change introduces
QuantQwen3VLVisionPatchEmbedwrapper to support post-training quantization ofQwen3VLVisionPatchEmbedmodule.Why?
Qwen3VLVisionPatchEmbedmodule is used in the image encoder part of Qwen model.Trying to quantize
Qwen3VLVisionPatchEmbedvia PTQ generates exceptionPTQQuantizer: no quantization wrapper for Qwen3VLVisionPatchEmbed.What
This change introduces:
QuantQwen3VLVisionPatchEmbed(tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_patch_embed.py).class TestQuantQwen3VLTextAttention(test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_patch_embed.py) - skipped iftransformerspackage is not installed.quant_vision_patch_embedin_CORE_MODULES(tico/quantization/wrapq/wrappers/registry.py).Qwen3VLVisionPatchEmbedquantization and conversion to Circle (tico/quantization/wrapq/examples/qwen/quantize_qwen_vision_patch_embed.py).Unit Tests
Unit tests results with coverage information:
Coverage info (irrelevant files skipped):