Skip to content

[onert] Introduce Attention operator#16055

Closed
glistening wants to merge 2 commits intoSamsung:masterfrom
glistening:op_attention
Closed

[onert] Introduce Attention operator#16055
glistening wants to merge 2 commits intoSamsung:masterfrom
glistening:op_attention

Conversation

@glistening
Copy link
Contributor

@glistening glistening commented Sep 11, 2025

It introduces Attention operator, which represents LlamaAttention or Standard Attention.
For former, it will include RoPE and for latter, it will do the original Transformer paper.

You can obtain decode.circle containing Attention using TICO:

@glistening glistening changed the title [onert] Introduces Attention operator [onert] Introduce Attention operator Sep 11, 2025
@glistening glistening force-pushed the op_attention branch 3 times, most recently from 5c56bc0 to 4b014b9 Compare September 16, 2025 05:00
@glistening glistening force-pushed the op_attention branch 5 times, most recently from dbef18f to 255b45f Compare September 23, 2025 08:15
@glistening glistening force-pushed the op_attention branch 4 times, most recently from 0081bcd to d46f926 Compare October 1, 2025 02:53
std::vector<float> q_rope_buf(rope_out_shape.FlatSize());
std::vector<float> k_rope_buf(rope_out_shape.FlatSize());

nnfw::cker::RoPEMode rope_mode = nnfw::cker::RoPEMode::kGptNeox;
Copy link
Contributor

@hseok-oh hseok-oh Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am already get used the name of neox, which used rotate_half approach first. means neox-style. The name is also used in llama.cpp. But if rotate_half looks more readable, it is good to rename.

@glistening glistening force-pushed the op_attention branch 7 times, most recently from 7ac4e8e to 3ef9889 Compare October 21, 2025 01:09
It introduces Attention operator in circle_schema.

ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
@glistening
Copy link
Contributor Author

  • rebased (as circle_schema.fbs is changed and bcqunembedding was introduced).
  • layer_idx is removed.

@glistening glistening force-pushed the op_attention branch 2 times, most recently from eee1464 to 6c0b8c9 Compare October 21, 2025 05:46
It adds Attention operator in IR, loader and kernel.

ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR/ready for review It is ready to review. Please review it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments