Hi there,
I wonder if there's any plan for supporting Mistral and Mixtral?
For Mistral, I think it should just be a GPT-2 with little tweaks (e.g., different activation function, sliding window attention) which might be easier to support; but not sure if it is more complicated with Mixtral since it is a MoE.