Currently RoPE the keys, and cache these. From a 1-minute think, this feels like it won't work with RoPE scaling. To read: https://arxiv.org/abs/2309.00071 https://github.com/ggerganov/llama.cpp/issues/2060