[onert] Introduce Attention operator#16055
[onert] Introduce Attention operator#16055glistening wants to merge 2 commits intoSamsung:masterfrom
Conversation
5c56bc0 to
4b014b9
Compare
dbef18f to
255b45f
Compare
0081bcd to
d46f926
Compare
d46f926 to
0112bb4
Compare
| std::vector<float> q_rope_buf(rope_out_shape.FlatSize()); | ||
| std::vector<float> k_rope_buf(rope_out_shape.FlatSize()); | ||
|
|
||
| nnfw::cker::RoPEMode rope_mode = nnfw::cker::RoPEMode::kGptNeox; |
There was a problem hiding this comment.
(It is not this draft's issue)
May be better to use naming rotate_half and rotate_every_two instead of model name GPT-NeoX and GPT-J.
GPT-NeoX: https://github.com/huggingface/transformers/blob/fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L368
GPT-J: https://github.com/huggingface/transformers/blob/fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db/src/transformers/models/gptj/modeling_gptj.py#L69
Llama: https://github.com/huggingface/transformers/blob/fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db/src/transformers/models/llama/modeling_llama.py#L173
And it also may be Attention's parameter.
There was a problem hiding this comment.
I am already get used the name of neox, which used rotate_half approach first. means neox-style. The name is also used in llama.cpp. But if rotate_half looks more readable, it is good to rename.
7ac4e8e to
3ef9889
Compare
It introduces Attention operator in circle_schema. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
3ef9889 to
e6b040e
Compare
|
eee1464 to
6c0b8c9
Compare
It adds Attention operator in IR, loader and kernel. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
6c0b8c9 to
42879c7
Compare
It introduces Attention operator, which represents LlamaAttention or Standard Attention.
For former, it will include
RoPEand for latter, it will do the original Transformer paper.You can obtain
decode.circlecontainingAttentionusing TICO: