KV Cache with RoPE scaling.

Currently RoPE the keys, and cache these. From a 1-minute think, this feels like it won't work with RoPE scaling.

To read:
https://arxiv.org/abs/2309.00071
https://github.com/ggerganov/llama.cpp/issues/2060