Skip to content

Support proper KVCache #469

@dayo09

Description

@dayo09

What?

Previously, KVCache export was not supported for mainly TWO reasons.

(1) torch.export doesn't allow input mutation (e.g. in torch 2.7)

  • This is resolved in torch 2.10

(2) ONE CIRCLE doesn't allow in-memory buffer update.

  • TBD (Will they provide support?)

If above limitations are lifted, we could directly convert KV Cache into corresponding Circle.

Importance

  • ASR/Llama/VLM models includes self-attention/cross-attention and they applies implies multiple kv cache handling.
      • speculative decoding

KV Cache's aten origins

ERROR:tico.utils.convert:NOT SUPPORTED OPERATOR
        (op) index_put.default
        (trace)   File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/torch/_dynamo/functional_export.py", line 216, in forward
    res = self._export_root(*args, **kwargs)
  File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 337, in forward
    key_states, value_states = past_key_values.update(
  File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/transformers/cache_utils.py", line 783, in update
    keys, values = self.layers[layer_idx].update(key_states, value_states, cache_kwargs)
  File "/home/dayo/miniconda3/envs/py310-tvm/lib/python3.10/site-packages/transformers/cache_utils.py", line 340, in update
    self.values.index_copy_(2, cache_position, value_states)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions