[ggma] Add documentation for TinyLlama example#16283
[ggma] Add documentation for TinyLlama example#16283glistening wants to merge 3 commits intoSamsung:masterfrom
Conversation
|
I will append how to preparing ggma package and build ggma, and run. |
86030e4 to
d16d8b1
Compare
- Created `runtime/ggma/examples/generate_text/tinyllama.md` with step‑by‑step guide. - Includes prerequisites, model generation commands, full processing pipeline, and a summary. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
4234213 to
a1219ae
Compare
|
|
||
| model = AutoModelForCausalLM.from_pretrained(model_name) | ||
| model.eval() | ||
| circle_model = tico.convert(model, captured_input) |
There was a problem hiding this comment.
FOR OTHER REVIEWERS,
You may encounter export error related to vmap_impl which is caused as sdpa_mask_recent_torch is no more torch-exportable since 4.54.0 ~ 4.57.1 (maybe lower versions too, I checked only 4.54.0 and 4.57.1).
It can be resolved by using transformers==4.50.3 as the author wrote in requirements.txt.
056dd75 to
f78430e
Compare
b24f78c to
71f6721
Compare
edf7864 to
cb3b36a
Compare
| PR_WORKTREE = "_pr_16233" | ||
| PR_BRANCH = "pr-16233" | ||
| PR_REF = "refs/pull/16233/head" |
There was a problem hiding this comment.
It will be removed once 16233 is merged.
cb3b36a to
e1d1b3b
Compare
| @@ -0,0 +1,10 @@ | |||
| decode: | | |||
| fuse.attention.py < decode_.circle | |||
| | reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4] | |||
There was a problem hiding this comment.
Later, kv_cache's shape will be determined automatically based on config.json.
|
|
||
| merge: | | ||
| merge.circles.py prefill.circle decode.circle | ||
| | fuse.bmm_lhs_const.py |
There was a problem hiding this comment.
onert does not allow const lhs for batchmatmul.
| merge: | | ||
| merge.circles.py prefill.circle decode.circle | ||
| | fuse.bmm_lhs_const.py | ||
| | downcast.input_ids.py |
There was a problem hiding this comment.
I will use int32 instead of int64 (← the default type from TICO generated) for input_ids, which is given by gather.
| merge.circles.py prefill.circle decode.circle | ||
| | fuse.bmm_lhs_const.py | ||
| | downcast.input_ids.py | ||
| | gc.py > model.circle |
There was a problem hiding this comment.
It removes unreachable {input/output,tensor,buffer,...}.
| | transpose.io.kvcache.py > decode.circle | ||
|
|
||
| merge: | | ||
| merge.circles.py prefill.circle decode.circle |
There was a problem hiding this comment.
It will merge two circles into one circle.
In this phase, the weight sharing is handled by pointing the same buffer index for same content of weights.
cd293c9 to
0b8bd39
Compare
b93d59c to
c86b5cd
Compare
3c8d290 to
2816c7f
Compare
2816c7f to
f1d3ef6
Compare
runtime/ggma/examples/generate_text/tinyllama.mdwith step‑by‑step guide.ONE-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com