[quantization] Introduce Qwen3VLVisionMLP wrapper by stamalakhov · Pull Request #485 · Samsung/TICO

stamalakhov · 2026-02-12T05:10:44Z

This commit adds Qwen3VLVisionMLP wrapper and tests for it.

┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.016181
│ PEIR       : 1.979650 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 7.6┤                                            │
    │                                        ••  │
    │                                      ••    │
 5.1┤                                    •••     │
    │                                 ••••       │
    │                               ••••         │
    │                             ••••           │
 2.7┤                           ••••             │
    │                         ••••               │
    │                       ••••                 │
 0.3┤                     ••••                   │
    │                   ••••                     │
    │                 ••••                       │
    │               ••••                         │
-2.1┤             •••                            │
    │           ••••                             │
    │         •••                                │
-4.6┤       •••                                  │
    │     •••                                    │
    │   •••                                      │
    │  ••                                        │
-7.0┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -7.0       -3.3        0.3       3.9       7.6 

Quantized Circle model saved to /mnt/storage/slow_repos/VLM_TICO/TICO/qwen3vl_vision_mlp.q.circle

./ccex test -k quantization.wrapq.wrappers.qwen_vl.test_quant_vision_mlp


RUN unit tests with -k quantization.wrapq.wrappers.qwen_vl.test_quant_vision_mlp ...
test_mode_and_forward (quantization.wrapq.wrappers.qwen_vl.test_quant_vision_mlp.TestQuantQwenVisionMLP.test_mode_and_forward) ... ok
test_calib_quant_export (quantization.wrapq.wrappers.qwen_vl.test_quant_vision_mlp.TestSubgraphExport.test_calib_quant_export) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.893s

OK

Draft: #484

dayo09 · 2026-02-12T05:51:52Z

tico/quantization/wrapq/examples/qwen/quantize_qwen_vision_mlp.py

I believe AutoModelForVision2Seq is included in recent transformer versions.
Could you elaborate your torch and transformers versions on any document?

We need to convert examples into CI someday. It will be helpful to write down the versions.

@dayo09
Ahhh. It occured that AutoModelForVision2Seq is removed in v5.0. I believe i should change it to AutoModelForImageTextToText, which is used in transformers since 4.5 version.

@dayo09
Fixed. Am i supposed to support AutoModelForVision2Seq for versions below 4.5?

@stamalakhov Thanks so much!

Fixed. Am i supposed to support AutoModelForVision2Seq for versions below 4.5?

No, I don't think so. I believe just mentioning the version is enough for now. It's just for later test. We are preparing for Qwen3-VL model structure's quantization/frontend-compilation possibility. It's not about deployment level yet. :-D

BTW, could you share your torch and transformers verisons? I encounter this error by running the example. 😓

Loading weights: 100%|████████████████████████████████████████████████████████████████████████████| 625/625 [00:00<00:00, 1913.24it/s, Materializing param=model.visual.pos_embed.weight] Traceback (most recent call last): File "/home/dayo/git/TICO/tico/quantization/wrapq/examples/quantize_qwen_vision_mlp.py", line 74, in <module> int8_out = mlp_q(hidden) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl return forward_call(*args, **kwargs) File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/ptq_wrapper.py", line 46, in forward return self.wrapped(*args, **kwargs) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl return forward_call(*args, **kwargs) File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_mlp.py", line 76, in forward fc1 = self.linear_fc1(x_q) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl return forward_call(*args, **kwargs) File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/ptq_wrapper.py", line 46, in forward return self.wrapped(*args, **kwargs) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl return forward_call(*args, **kwargs) File "/home/dayo/git/TICO/tico/quantization/wrapq/wrappers/nn/quant_linear.py", line 57, in forward w = self.obs_weight.fake_quant(w) File "/home/dayo/git/TICO/tico/quantization/wrapq/observers/affine_base.py", line 152, in fake_quant return torch.fake_quantize_per_channel_affine( RuntimeError: !needs_dynamic_casting<func_t>::check(iter) INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cpu/Loops.h":311, please report a bug to PyTorch.

No, I don't think so. I believe just mentioning the version is enough for now. It's just for later test. We are preparing for Qwen3-VL model structure's quantization/frontend-compilation possibility. It's not about deployment level yet. :-D

@dayo09
Got it. Thank you.

@stamalakhov Ah, it was about the module's dtype==torch.bfloat16

BTW, could you share your torch and transformers verisons? I encounter this error by running the example. 😓

Transformers ~ 4.57.6, torch ~ 2.10.0

This commit adds Qwen3VLVisionMLP wrapper and tests for it. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

tico/quantization/wrapq/examples/qwen/quantize_qwen_vision_mlp.py

Apply suggestions from code review Co-authored-by: Dayoung Lee <dayoung.lee@samsung.com>

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

dayo09

LGTM

mhs4670go

LGTM

stamalakhov requested review from a team and mhs4670go February 12, 2026 05:10

stamalakhov self-assigned this Feb 12, 2026

stamalakhov mentioned this pull request Feb 12, 2026

[DRAFT][quantization] Introduce Qwen3VLVisionMLP wrapper #484

Closed

stamalakhov force-pushed the vision_mlp_pr branch 2 times, most recently from bbd856b to e280ab2 Compare February 12, 2026 05:37

dayo09 reviewed Feb 12, 2026

View reviewed changes

stamalakhov force-pushed the vision_mlp_pr branch from e280ab2 to 05e68d0 Compare February 12, 2026 06:14

[quantization] Introduce Qwen3VLVisionMLP wrapper

ba32a17

This commit adds Qwen3VLVisionMLP wrapper and tests for it. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

stamalakhov force-pushed the vision_mlp_pr branch from 05e68d0 to ba32a17 Compare February 12, 2026 06:18

dayo09 reviewed Feb 12, 2026

View reviewed changes

tico/quantization/wrapq/examples/qwen/quantize_qwen_vision_mlp.py Outdated Show resolved Hide resolved

Apply suggestions from code review

07ba730

Apply suggestions from code review Co-authored-by: Dayoung Lee <dayoung.lee@samsung.com>

dayo09 mentioned this pull request Feb 12, 2026

[wrapq] torch.bfloat16 induces torch internal error #487

Open

Apply format.

a41ba40

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

stamalakhov requested a review from dayo09 February 12, 2026 08:09

dayo09 approved these changes Feb 12, 2026

View reviewed changes

mhs4670go approved these changes Feb 13, 2026

View reviewed changes

mhs4670go merged commit 3dc76ec into Samsung:main Feb 13, 2026
7 checks passed

stamalakhov deleted the vision_mlp_pr branch February 13, 2026 04:53

dvsav mentioned this pull request Feb 13, 2026

Qwen3-VL: Implement quantization wrappers #483

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Introduce Qwen3VLVisionMLP wrapper#485

[quantization] Introduce Qwen3VLVisionMLP wrapper#485
mhs4670go merged 3 commits intoSamsung:mainfrom
stamalakhov:vision_mlp_pr

stamalakhov commented Feb 12, 2026

Uh oh!

dayo09 Feb 12, 2026

Uh oh!

stamalakhov Feb 12, 2026

Uh oh!

stamalakhov Feb 12, 2026

Uh oh!

dayo09 Feb 12, 2026

Uh oh!

dayo09 Feb 12, 2026

Uh oh!

stamalakhov Feb 12, 2026

Uh oh!

dayo09 Feb 12, 2026

Uh oh!

stamalakhov Feb 12, 2026

Uh oh!

Uh oh!

dayo09 left a comment

Uh oh!

mhs4670go left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

stamalakhov commented Feb 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dayo09 left a comment

Choose a reason for hiding this comment

Uh oh!

mhs4670go left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments