Skip to content

[quantization] Decoder output quantization#458

Merged
mhs4670go merged 1 commit intoSamsung:mainfrom
stamalakhov:decoder_res_2
Feb 3, 2026
Merged

[quantization] Decoder output quantization#458
mhs4670go merged 1 commit intoSamsung:mainfrom
stamalakhov:decoder_res_2

Conversation

@stamalakhov
Copy link
Contributor

@stamalakhov stamalakhov commented Feb 2, 2026

This PR ensures output of decoder layer is quantized.

Right now output of llama decoder layer is quantized to float.

./ccex test -k quantization.wrapq.wrappers.llama.test_quant_decoder_layer

RUN unit tests with -k quantization.wrapq.wrappers.llama.test_quant_decoder_layer ...
test_dtype_override (quantization.wrapq.wrappers.llama.test_quant_decoder_layer.TestQuantLlamaDecoderLayer) ... ok
test_forward_diff (quantization.wrapq.wrappers.llama.test_quant_decoder_layer.TestQuantLlamaDecoderLayer) ... ok
test_mode_transitions (quantization.wrapq.wrappers.llama.test_quant_decoder_layer.TestQuantLlamaDecoderLayer) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.054s

OK

Draft: #436
TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

@stamalakhov stamalakhov self-assigned this Feb 2, 2026
@mhs4670go
Copy link
Contributor

I missed that part when I implement it. Thanks.

@stamalakhov
Copy link
Contributor Author

I missed that part when I implement it. Thanks.

@mhs4670go
Glad to help.
Shall i provide tests for it?

@stamalakhov stamalakhov marked this pull request as ready for review February 3, 2026 05:04
@stamalakhov stamalakhov changed the title [quantization] [DRAFT] Decoder output quantization [quantization] Decoder output quantization Feb 3, 2026
@mhs4670go
Copy link
Contributor

Shall i provide tests for it?

Ah, how about add a simple test for observers like other wrappers?

This PR ensures output of `decoder` layer is quantized.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
@stamalakhov
Copy link
Contributor Author

Shall i provide tests for it?

Ah, how about add a simple test for observers like other wrappers?

@mhs4670go Tnank you.
I've added test_dtype_override for the only observer created in QuantLlamaDecoderLayer.

Copy link
Contributor

@mhs4670go mhs4670go left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhs4670go mhs4670go merged commit d8e57b8 into Samsung:main Feb 3, 2026
7 checks passed
@stamalakhov stamalakhov deleted the decoder_res_2 branch February 3, 2026 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants