[quantization] Introduce Qwen3VLVisionMLP wrapper#485
[quantization] Introduce Qwen3VLVisionMLP wrapper#485mhs4670go merged 3 commits intoSamsung:mainfrom
Conversation
bbd856b to
e280ab2
Compare
There was a problem hiding this comment.
I believe AutoModelForVision2Seq is included in recent transformer versions.
Could you elaborate your torch and transformers versions on any document?
We need to convert examples into CI someday. It will be helpful to write down the versions.
There was a problem hiding this comment.
@dayo09
Ahhh. It occured that AutoModelForVision2Seq is removed in v5.0. I believe i should change it to AutoModelForImageTextToText, which is used in transformers since 4.5 version.
There was a problem hiding this comment.
@dayo09
Fixed. Am i supposed to support AutoModelForVision2Seq for versions below 4.5?
There was a problem hiding this comment.
@stamalakhov Thanks so much!
Fixed. Am i supposed to support AutoModelForVision2Seq for versions below 4.5?
No, I don't think so. I believe just mentioning the version is enough for now. It's just for later test. We are preparing for Qwen3-VL model structure's quantization/frontend-compilation possibility. It's not about deployment level yet. :-D
There was a problem hiding this comment.
BTW, could you share your torch and transformers verisons? I encounter this error by running the example. 😓
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████| 625/625 [00:00<00:00, 1913.24it/s, Materializing param=model.visual.pos_embed.weight]
Traceback (most recent call last):
File "/home/dayo/git/TICO/tico/quantization/wrapq/examples/quantize_qwen_vision_mlp.py", line 74, in <module>
int8_out = mlp_q(hidden)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/ptq_wrapper.py", line 46, in forward
return self.wrapped(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_mlp.py", line 76, in forward
fc1 = self.linear_fc1(x_q)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/ptq_wrapper.py", line 46, in forward
return self.wrapped(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/nn/quant_linear.py", line 57, in forward
w = self.obs_weight.fake_quant(w)
File "/home/dayo/git/TICO/tico/quantization/wrapq/observers/affine_base.py", line 152, in fake_quant
return torch.fake_quantize_per_channel_affine(
RuntimeError: !needs_dynamic_casting<func_t>::check(iter) INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cpu/Loops.h":311, please report a bug to PyTorch. There was a problem hiding this comment.
No, I don't think so. I believe just mentioning the version is enough for now. It's just for later test. We are preparing for Qwen3-VL model structure's quantization/frontend-compilation possibility. It's not about deployment level yet. :-D
@dayo09
Got it. Thank you.
There was a problem hiding this comment.
@stamalakhov Ah, it was about the module's dtype==torch.bfloat16
There was a problem hiding this comment.
BTW, could you share your torch and transformers verisons? I encounter this error by running the example. 😓
Transformers ~ 4.57.6, torch ~ 2.10.0
e280ab2 to
05e68d0
Compare
This commit adds Qwen3VLVisionMLP wrapper and tests for it. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
05e68d0 to
ba32a17
Compare
tico/quantization/wrapq/examples/qwen/quantize_qwen_vision_mlp.py
Outdated
Show resolved
Hide resolved
Apply suggestions from code review Co-authored-by: Dayoung Lee <dayoung.lee@samsung.com>
TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
This commit adds Qwen3VLVisionMLP wrapper and tests for it.
./ccex test -k quantization.wrapq.wrappers.qwen_vl.test_quant_vision_mlp
Draft: #484