Skip to content

Tiny evaluation of leading LLMs on competitive programming problems

Notifications You must be signed in to change notification settings

anpaure/cp_eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Open In Colab

Competitive programming eval

eval_results

Problems are taken from Kazakhstan's 2024 regional and national informatics olympiads. This selection is a good choice because the problems are relatively new, originally available only in Russian, and unlikely to have been crawled online since they are shared exclusively in a private group. Moreover, no public editorials or solutions exist, so no overfitting. Problems are sorted by difficulty groups, A is the easiest, B is slightly harder, C is the hardest.

The results are obtained by prompting 5 SOTA models (deepseek-r1-lite-preview, chatgpt-4o-latest-20241120, claude-3-5-sonnet-20241022, o1-mini, qwq-32b-preview) with: "You are a competitive programming grandmaster. Solve the following programming problem." followed by the problem statement.

Some prompts were re-run a couple of times if the model failed to write a solution that earned points. In case a model generally gets the idea but makes a bug somewhere, It was told "You have an error somewhere, debug your code." without any additional pointers.

Essentially, the goal wasn't to conduct a perfect experiment, but rather squeeze the most out of models without any advanced prompting or RAG. The code probably won't execute correctly in the notebook, that's because LLMs generate weird code. I could change that, but I decided to keep the raw code instead.

About

Tiny evaluation of leading LLMs on competitive programming problems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published