-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
What?
Previously, KVCache export was not supported for mainly TWO reasons.
(1) torch.export doesn't allow input mutation (e.g. in torch 2.7)
- This is resolved in torch 2.10
(2) ONE CIRCLE doesn't allow in-memory buffer update.
- TBD (Will they provide support?)
If above limitations are lifted, we could directly convert KV Cache into corresponding Circle.
Importance
- ASR/Llama/VLM models includes self-attention/cross-attention and they applies implies multiple kv cache handling.
-
- speculative decoding
-
KV Cache's aten origins
ERROR:tico.utils.convert:NOT SUPPORTED OPERATOR
(op) index_put.default
(trace) File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/_dynamo/functional_export.py", line 216, in forward
res = self._export_root(*args, **kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 337, in forward
key_states, value_states = past_key_values.update(
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/transformers/cache_utils.py", line 783, in update
keys, values = self.layers[layer_idx].update(key_states, value_states, cache_kwargs)
File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/transformers/cache_utils.py", line 340, in update
self.values.index_copy_(2, cache_position, value_states)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels