Skip to content

Commit fac7237

Browse files
GanquGanqu
authored andcommitted
add bench
2 parents 3f50762 + 0d3192b commit fac7237

31 files changed

+1803
-7
lines changed

README.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# P1: Mastering Physics Olympiads with Reinforcement Learning
2+
3+
4+
5+
[![Blog](https://img.shields.io/badge/Blog-P1-0D1117?style=for-the-badge&logo=githubpages&logoColor=white)](https://prime-rl.github.io/P1/)
6+
[![P1-30B](https://img.shields.io/badge/Hugging%20Face-P1--30B--A3B-FCD022?style=for-the-badge&logo=huggingface)](https://huggingface.co/PRIME-RL/P1-30B-A3B)
7+
[![P1-235B](https://img.shields.io/badge/Hugging%20Face-P1--235B--A22B-FCD022?style=for-the-badge&logo=huggingface)](https://huggingface.co/PRIME-RL/P1-235B-A22B)
8+
[![Leaderboard](https://img.shields.io/badge/Leaderboard-HiPhO-2DBA4E?style=for-the-badge&logo=chartdotjs&logoColor=white)](https://phyarena.github.io/)
9+
10+
<p align="center">
11+
<img src="docs/imgs/Score_IPhO_2025_P1_v2.jpg" alt="IPhO 2025 Score" width="100%">
12+
</p>
13+
14+
15+
16+
## Overview
17+
18+
Physics reasoning is central to understanding and shaping the real world. Top contests like the **International Physics Olympiad (IPhO)** set a high bar for complex reasoning and deep physical understanding — a benchmark for evaluating AI's grasp of reality.
19+
20+
**P1** is the first open-source model series designed to tackle Olympiad-level physics reasoning through multi-stage reinforcement learning (RL) and a co-evolutionary multi-agent system (PhysicsMinions). It achieved gold medal-level performance on IPhO 2025. We release two model versions:
21+
22+
- **[P1-30B-A3B](https://huggingface.co/PRIME-RL/P1-30B-A3B)**: A 30B parameter model that surpasses larger closed-source models, demonstrating exceptional efficiency
23+
- **[P1-235B-A22B](https://huggingface.co/PRIME-RL/P1-235B-A22B)**: A 235B parameter model achieving gold medal performance on IPhO 2025, rivaling top closed-source models
24+
25+
---
26+
27+
## Results
28+
29+
P1 models demonstrate **top-tier physics reasoning** across all HiPhO contests.
30+
31+
<p align="center">
32+
<img src="docs/source_png/leaderboard.png" alt="HiPhO Leaderboard" width="100%">
33+
</p>
34+
35+
36+
---
37+
38+
P1’s physics reasoning transfers effectively across other STEM domains.
39+
40+
#### STEM Benchmarks
41+
42+
| Benchmark | P1-235B-A22B | Qwen3-235B-A22B-Thinking-2507 | P1-30B-A3B | Qwen3-30B-A3B-Thinking-2507 |
43+
| ------------- | -----------: | ----------------------------: | ---------: | --------------------------: |
44+
| AIME24 | 95.0 | 94.6 | 91.0 | 90.4 |
45+
| AIME25 | 95.0 | 94.2 | 91.0 | 85.0 |
46+
| HMMT | 80.8 | 81.7 | 76.9 | 71.3 |
47+
| GPQA | 81.4 | 79.4 | 74.4 | 73.0 |
48+
| HLE | 19.1 | 17.5 | 14.3 | 11.6 |
49+
| LiveCodeBench | 75.8 | 76.2 | 68.1 | 66.7 |
50+
| LiveBench | 79.8 | 80.3 | 77.0 | 76.6 |
51+
52+
## 🧮 HiPhO Benchmark
53+
54+
[**HiPhO (High School Physics Olympiad)**](https://arxiv.org/abs/2509.07894) is the first benchmark focused on recent Olympiad-level physics contests with **human-aligned evaluation**.
55+
56+
📚 It compiles 13 competitions (IPhO, APhO, EuPhO, etc.) from 2024–2025, using **official rubrics** and **fine-grained scoring** aligned with medal cutoffs.
57+
58+
---
59+
60+
## Co-Evolution Multi-Agent System: PhysicsMinions
61+
62+
To go beyond single-model limits, P1 introduces [**PhysicsMinions**](https://arxiv.org/abs/2509.24855) — a co-evolution multi-agent system that iteratively refines solutions through self-verification and reflection.
63+
64+
| Module | Function |
65+
| ----------------- | ------------------------------------------------------------ |
66+
| **Visual Studio** | Extracts structured visual information from diagrams (not used in current experiments). |
67+
| **Logic Studio** | Generates and refines initial reasoning chains. |
68+
| **Review Studio** | Performs two-stage validation: physical consistency and logical correctness. |
69+
70+
Failures trigger a **feedback loop** to improve the reasoning process — resulting in stronger robustness and reliability.
71+
72+
73+
---
74+
75+
76+
## Acknowledgements
77+
78+
We are grateful to the open-source community for their invaluable contributions. Special thanks to:
79+
80+
- **[Qwen3](https://huggingface.co/collections/Qwen/qwen3)** - for providing the foundational base models that powered our research
81+
- **[slime](https://github.com/THUDM/slime)** - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
82+
- **[verl](https://github.com/volcengine/verl)** - for the versatile reinforcement learning framework that enabled our training pipeline
83+
- **[sglang](https://github.com/sgl-project/sglang)** - for the efficient LLM serving and inference infrastructure
84+
- **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)** - for the large-scale model training framework
85+
86+
We also thank colleagues and collaborators who supported the development of P1 models, the accompanying datasets and visual assets.
87+
88+
89+
## 🧾 Citation
90+
91+
If you find this work useful, please cite:
92+
93+
```bibtex
94+
@misc{p12025,
95+
title={P1: Mastering Physics Olympiads with Reinforcement Learning},
96+
author={P1 Team},
97+
year={2025},
98+
url={https://prime-rl.github.io/P1/}
99+
}

docs/.DS_Store

0 Bytes
Binary file not shown.

docs/case_study/.DS_Store

6 KB
Binary file not shown.
543 KB
Loading
55.3 KB
Loading
284 KB
Loading
38.6 KB
Loading
3.95 MB
Binary file not shown.
4 MB
Binary file not shown.

docs/case_study/case/a.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Cox's Timepiece (大题背景)
2+
3+
In 1765, British clockmaker James Cox invented a clock whose only source of energy is the fluctuations in atmospheric pressure. Cox's clock used two vessels containing mercury. Changes in atmospheric pressure caused mercury to move between the vessels, and the two vessels to move relative to each other. This movement acted as an energy source for the actual clock.
4+
5+
We propose an analysis of this device. Throughout, we assume that
6+
- the Earth's gravitational field $\vec{g} = -g\vec{u_{z}}$ is uniform with $g = 9.8 \mathrm{m} \cdot \mathrm{s}^{-2}$ and $\vec{u_{z}}$ a unit vector;
7+
- all liquids are incompressible and their density is denoted $\rho$;
8+
- no surface tension effects will be considered;
9+
- the variations of atmospheric pressure with altitude are neglected;
10+
- the surrounding temperature $T_{\mathrm{a}}$ is uniform and all transformations are isothermal.
11+
12+
![figure1](IPhO_2025_2_a_1.png)
13+
Fig. 1. Artistic view of Cox's clock
14+
15+
## Part B - Two-part barometric tube (解part C需要用到part B的信息)
16+
17+
![figure4](IPhO_2025_2_b_2.png)
18+
Fig. 4. Simplified model of the perturbative term $P_{1}(t)$
19+
20+
## Part C - Cox's timepiece (part C 的背景)
21+
22+
The real mechanism developed by Cox is complex (Fig. 5). We study a simplified version, depicted in Fig. 6, and described below:
23+
- a cylindrical bottom cistern containing a mercury bath;
24+
- a two-part barometric tube identical to that studied in part B, which is still completely emptied of any air, is dipped into the bath;
25+
- the cistern and the two-part tube are each suspended by a cable. Both cables (assumed to be inextensible and of negligible mass) pass through a system of ideal pullies and finish attached to either side of the same mass $M$, which can slide on a horizontal surface;
26+
- the total volume of liquid mercury contained in the system is $V_{\ell} = 5 \mathrm{L}$.
27+
28+
The height, cross-section and masses of each part are given in Table 2. The position of mass $M$ is referenced by the coordinate $x$ of its center of mass. We consider solid friction between the horizontal support and the mass $M$, without distinction between static and dynamic coefficients; the magnitude of this force when sliding occurs is denoted $F_{\mathrm{s}}$.
29+
30+
Two stops limit the displacement of the mass $M$ such that $-X \leq x \leq X$ (with $X > 0$). Assume that the value of $X$ guarantees that
31+
- the bottom of the two-part tube never touches the bottom of the cistern nor comes out of the liquid bath;
32+
- the altitude $z_{\ell}$ of the mercury column is always in the upper bulb.
33+
34+
![figure5](IPhO_2025_2_c_1.png)
35+
Fig. 5. Real Cox's timepiece (without mercury)
36+
37+
![figure6](IPhO_2025_2_c_2.png)
38+
Fig. 6. Sketch of the system modeling the timepiece
39+
40+
|Reference|Name|Height|Cross section area|Empty mass|
41+
|-|-|-|-|-|
42+
|1|cistern|$H_{\mathrm{c}} = 30 \mathrm{cm}$|$S_{\mathrm{c}} = 210 \mathrm{cm}^{2}$|$m_{\mathrm{c}}$|
43+
|2|tubular part of the barometric tube|$H_{\mathrm{t}} = 80 \mathrm{cm}$|$S_{\mathrm{t}} = 5 \mathrm{cm}^{2}$|rowspan=\"2\" total mass of the barometric tube : $m_{\mathrm{tb}}$|
44+
|$2^{\prime}$|bulb of the barometric tube|$H_{\mathrm{b}} = 20 \mathrm{cm}$|$S_{\mathrm{b}} = 200 \mathrm{cm}^{2}$| |
45+
Table 2. Dimensions and notations for the model system
46+
47+
The system evolves in contact with the atmosphere, whose pressure fluctuates as in Fig. 4 (still with amplitude $A = 5 \times 10^{2} \mathrm{Pa}$ and period $\tau_{1} = 1$ week). At the start $t = 0$, the mass $M$ is at rest at $x = 0$ and the tensions exerted by the two cables on either side of the mass $M$ are in balance while $P_{1}(0) = 0$. We define
48+
49+
$$
50+
\xi = \frac{S_{\mathrm{b}} + S_{\mathrm{c}} - S_{\mathrm{t}}}{S_{\mathrm{b}} S_{\mathrm{c}}} \frac{F_{\mathrm{s}}}{A} \simeq \frac{S_{\mathrm{b}} + S_{\mathrm{c}}}{S_{\mathrm{b}} S_{\mathrm{c}}} \frac{F_{\mathrm{s}}}{A}
51+
\qquad (3)
52+
$$
53+
where the last expression uses that $S_{\mathrm{t}} \ll S_{\mathrm{b}}, S_{\mathrm{c}}$ (which we will assume is valid until the end of the problem).
54+
55+
(C.1) Determine the threshold $\xi^{\star}$ such that $M$ remains indefinitely at rest when $\xi > \xi^{\star}$.
56+
57+
For the question (C.2) only, suppose that the mass $M$ is temporarily blocked at $x = X$.
58+
59+
(C.2) Give an expression for the total tension force $\vec{T} = T \vec{u_{x}}$ acting on the mass $M$ due to the tension in two cables at this position, when $P_{1} = 0$, in terms of $\rho, g, X$ and pertinent cross-sections.
60+
61+
When $\xi < \xi^{\star}$, starting again from $x = 0$ and $P_{1} = 0$, two different behaviours can be observed for $t \geq 0$. To distinguish them, we need to introduce another parameter
62+
$$
63+
\lambda = \frac{2 (S_{\mathrm{b}} - S_{\mathrm{t}}) }{S_{\mathrm{b}}} \frac{\rho g X}{A} \simeq \frac{2 \rho g X}{A}
64+
\qquad (4)
65+
$$
66+
67+
(C.3) Complete the table in the answer sheet to indicate the condition under which each regime is obtained. Conditions must be expressed as inequalities on $\xi$ and/or $\lambda$. In addition, sketch the variations of $x(t) / X$ for $t \in [0, 3 \tau_{1}]$ that are consistent with the variations of $P_{1}(t) / A$ already present. Specification of remarkable points coordinates is not required.
68+
69+
In the real Cox's timepiece, energy provided by the mechanism is stored using a system of ratchets and used to raise a counterweight, like in a traditional clock. In the simplified model studied here, the energy recovered by the clock corresponds to the energy dissipated by the friction force exerted by the horizontal surface on the mass $M$. From now on, we assume that the system is dimensioned such that to work in the regime that allows the clock to recuperate energy. We also assume that the permanent regime is established. We denote $W$ the energy dissipated by the solid friction force during a period $\tau_{1}$, which can be expressed only in terms of $F_{\mathrm{s}}$ and $X$.
70+
71+
All else equal, $F_{\mathrm{s}}$ and $X$ can be adjusted to maximize the energy $W$; we denote $F_{\mathrm{s}}^{\star}$ and $X^{\star}$ their respective values in the optimal situation.
72+
73+
74+
### (本题)
75+
(C.4) Considering $S_{\mathrm{b}} \simeq S_{\mathrm{c}}$ and $S_{\mathrm{t}} \ll S_{\mathrm{b}}$, determine the expressions for (1) $F_{\mathrm{s}}^{\star}$ and (2) $X^{\star}$ as functions of $\rho, g, S_{\mathrm{c}}$ and $A$. (3) Express the corresponding maximum energy $W^{\star}$, (4) then calculate its numerical value in $\mathrm{mJ}$ with $A = 5 \times 10^{2} \mathrm{Pa}$.

0 commit comments

Comments
 (0)