[onert] Introduce Attention operator by glistening · Pull Request #16055 · Samsung/ONE

glistening · 2025-09-11T04:52:39Z

It introduces Attention operator, which represents LlamaAttention or Standard Attention.
For former, it will include RoPE and for latter, it will do the original Transformer paper.

You can obtain decode.circle containing Attention using TICO:

[test+operators] Fuse llama attention to circle attention TICO#217

nnpackage/schema/circle_schema.fbs

runtime/onert/backend/cpu/ops/AttentionLayer.cc

hseok-oh · 2025-10-16T04:18:16Z

runtime/onert/backend/cpu/ops/AttentionLayer.cc

+  std::vector<float> q_rope_buf(rope_out_shape.FlatSize());
+  std::vector<float> k_rope_buf(rope_out_shape.FlatSize());
+
+  nnfw::cker::RoPEMode rope_mode = nnfw::cker::RoPEMode::kGptNeox;


(It is not this draft's issue)
May be better to use naming rotate_half and rotate_every_two instead of model name GPT-NeoX and GPT-J.

GPT-NeoX: https://github.com/huggingface/transformers/blob/fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L368
GPT-J: https://github.com/huggingface/transformers/blob/fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db/src/transformers/models/gptj/modeling_gptj.py#L69
Llama: https://github.com/huggingface/transformers/blob/fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db/src/transformers/models/llama/modeling_llama.py#L173

And it also may be Attention's parameter.

I am already get used the name of neox, which used rotate_half approach first. means neox-style. The name is also used in llama.cpp. But if rotate_half looks more readable, it is good to rename.

runtime/onert/backend/cpu/ops/AttentionLayer.cc

It introduces Attention operator in circle_schema. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>

glistening · 2025-10-21T02:05:23Z

rebased (as circle_schema.fbs is changed and bcqunembedding was introduced).
layer_idx is removed.

It adds Attention operator in IR, loader and kernel. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>

glistening changed the title ~~[onert] Introduces Attention operator~~ [onert] Introduce Attention operator Sep 11, 2025

glistening mentioned this pull request Aug 25, 2025

[onert/llm] Support tinyllama model #15627

Closed

37 tasks

glistening force-pushed the op_attention branch 3 times, most recently from 5c56bc0 to 4b014b9 Compare September 16, 2025 05:00

hseok-oh reviewed Sep 16, 2025

View reviewed changes

nnpackage/schema/circle_schema.fbs Outdated Show resolved Hide resolved

hseok-oh reviewed Sep 16, 2025

View reviewed changes

nnpackage/schema/circle_schema.fbs Outdated Show resolved Hide resolved

glistening force-pushed the op_attention branch 5 times, most recently from dbef18f to 255b45f Compare September 23, 2025 08:15

glistening force-pushed the op_attention branch 4 times, most recently from 0081bcd to d46f926 Compare October 1, 2025 02:53

glistening force-pushed the op_attention branch from d46f926 to 0112bb4 Compare October 16, 2025 01:17

hseok-oh reviewed Oct 16, 2025

View reviewed changes

runtime/onert/backend/cpu/ops/AttentionLayer.cc Outdated Show resolved Hide resolved

hseok-oh reviewed Oct 16, 2025

View reviewed changes

runtime/onert/backend/cpu/ops/AttentionLayer.cc Show resolved Hide resolved

glistening force-pushed the op_attention branch 7 times, most recently from 7ac4e8e to 3ef9889 Compare October 21, 2025 01:09

[onert] Add Attention operator in circle_schema.fbs

0214620

It introduces Attention operator in circle_schema. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>

glistening force-pushed the op_attention branch from 3ef9889 to e6b040e Compare October 21, 2025 02:03

glistening force-pushed the op_attention branch 2 times, most recently from eee1464 to 6c0b8c9 Compare October 21, 2025 05:46

[onert] Introduce Attention operator

42879c7

It adds Attention operator in IR, loader and kernel. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>

glistening force-pushed the op_attention branch from 6c0b8c9 to 42879c7 Compare October 22, 2025 06:10

glistening added the PR/ready for review It is ready to review. Please review it. label Oct 22, 2025

This was referenced Oct 22, 2025

[onert] Add Attention operator in circle_schema.fbs #16227

Merged

[onert] Introduce Attention operator #16234

Merged

glistening closed this Oct 28, 2025

glistening mentioned this pull request Nov 11, 2025

[onert] Remove layer_idx from op_attention #16279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[onert] Introduce Attention operator#16055

[onert] Introduce Attention operator#16055
glistening wants to merge 2 commits intoSamsung:masterfrom
glistening:op_attention

glistening commented Sep 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hseok-oh Oct 16, 2025 •

edited

Loading

Uh oh!

glistening Oct 20, 2025

Uh oh!

Uh oh!

glistening commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

glistening commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hseok-oh Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glistening Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glistening commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

glistening commented Sep 11, 2025 •

edited

Loading

hseok-oh Oct 16, 2025 •

edited

Loading