Skip to content

Comments

data(manual-annotation): record 5th model release date/source#38

Open
mrshu wants to merge 1 commit intoevaleval:mainfrom
mrshu:mrshu/model-release-date-5th-top-model
Open

data(manual-annotation): record 5th model release date/source#38
mrshu wants to merge 1 commit intoevaleval:mainfrom
mrshu:mrshu/model-release-date-5th-top-model

Conversation

@mrshu
Copy link
Contributor

@mrshu mrshu commented Jan 21, 2026

Populate per-benchmark columns using the release date and primary source link for the 5th-highest-performing entry.

Populate per-benchmark columns using the release date and primary source link for the 5th-highest-performing entry.
@mrshu
Copy link
Contributor Author

mrshu commented Jan 21, 2026

cc @mubasharaak @zouharvi

Gemini 1.5 Pro (Google): 0.892
Qwen 3 235B- A22B 0.8887 - Qwen3 Technical Report
Kimi K2: 0.8871 - Kimi K2: Open Agentic Intelligence
Gemma 3 27B (Google): 0.876 (Incorrect)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mubasharaak just noticed this in here -- I am not sure why would Gemma be incorrect but if that's the case, it probably should not be the 5th highest performing model, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm good question, can you check the leaderboard which is linked as a source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through it @mubasharaak and I really cannot see anything "incorrect" about it :)

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants