It would be great if all high-scoring small models on MMLU-Pro could be validated to provide reliable and complete scores. These small models are valuable as they're fast and cheap to run while showcasing important trends in model and distillation efficiency.
Small, high-scoring models
QwQ Family
Microsoft/Phi Family
Qwen Family
Google Family
Mistral Family
Other Models