Skip to content

[onert] Add Attention operator in circle_schema.fbs#16227

Merged
hseok-oh merged 1 commit intoSamsung:masterfrom
glistening:schema_add_attention
Oct 23, 2025
Merged

[onert] Add Attention operator in circle_schema.fbs#16227
hseok-oh merged 1 commit intoSamsung:masterfrom
glistening:schema_add_attention

Conversation

@glistening
Copy link
Contributor

@glistening glistening commented Oct 22, 2025

It introduces Attention operator in circle_schema.

ONE-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com

With #16055 and #16056, I confirmed attention op works using GGMA. GGMA generates the exactly same sequence of tokens that HuggingFace transformer (python) generated.

It introduces Attention operator in circle_schema.

ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
@hseok-oh hseok-oh requested a review from a team October 22, 2025 10:36
Copy link
Contributor

@hseok-oh hseok-oh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glistening
Copy link
Contributor Author

@Samsung/nncc_committers

It adds ATTENTION operator, which corresponds to LLamaAttention in modeling_llama.py.

@glistening
Copy link
Contributor Author

Attention

Description of Input and Output

input

  • 0: state
  • 1: wq
  • 2: wk
  • 3: wv
  • 4: wo
  • 5: cos
  • 6: sin
  • 7: mask
  • 8: kcache
  • 9: vcache
  • 10: pos

output

  • 0: state

For details, please see #16055.

As of this writing, it supports LLaMA style Attention.
For standard attention (which, injects position embeddings out of decode layers), it will not get cos and sin, (that is, null pointer).

@glistening
Copy link
Contributor Author

glistening commented Oct 23, 2025

It does not use the frontends in this repo. Attention op is generated using TICO (newly introduced frontend), especially using Samsung/TICO#217.

Again, it does not require any change in luci-based frontends. It is luci-free solution like recently merged RUN_MODEL 1.
torch.export()-based frontend (TICO) will generate Attention operator, and onert supports Attention kernel.

Footnotes

  1. https://github.com/Samsung/ONE/issues/15676#issuecomment-3204541259

Copy link
Collaborator

@jinevening jinevening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you

@hseok-oh hseok-oh merged commit 454601f into Samsung:master Oct 23, 2025
14 checks passed
@glistening glistening deleted the schema_add_attention branch October 23, 2025 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments