Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ repos:
rev: v3.2.0
hooks:
- id: check-added-large-files
- repo: https://github.com/rhysd/actionlint
rev: v1.7.7
hooks:
- id: actionlint
- repo: local
hooks:
- id: nb-clean
Expand Down

Large diffs are not rendered by default.

100 changes: 100 additions & 0 deletions samples/cli/finetuning/reinforcement/data/rft_training_set.jsonl

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions samples/cli/finetuning/reinforcement/data/rft_validation_set.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\nExample 3 Given $a=\\frac{1}{2} \\sqrt{\\sqrt{2}+\\frac{1}{8}}-\\frac{\\sqrt{2}}{8}$. Try to find the value of $a^{2}+\\sqrt{a^{4}+a+1}$.", "role": "user"}], "answer": "\\sqrt{2"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\nThe surface area of a cube is 24 . The volume of the cube is\n(A) 4\n(B) $3 \\sqrt{3}$\n(C) 9\n(D) 16\n(E) 8", "role": "user"}], "answer": "E"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\n2. Given $\\sin 2 x=\\frac{\\sin \\theta+\\cos \\theta}{2}$, $\\cos ^{2} x=\\sin \\theta \\cdot \\cos \\theta$.\nThen, the value of $\\cos 2 x$ is ( ).\n(A) $\\frac{-1 \\pm \\sqrt{33}}{8}$\n(B) $\\frac{-1+\\sqrt{33}}{8}$\n(C) $\\frac{-1-\\sqrt{33}}{8}$\n(D) 0", "role": "user"}], "answer": "C"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\n4. Find the sum\n\n$$\n\\log \\operatorname{tg} 1^{\\circ}+\\log \\operatorname{tg} 2^{\\circ}+\\ldots+\\log \\operatorname{tg} 89^{\\circ}\n$$", "role": "user"}], "answer": "0"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\n1. Find the sum of the numbers:\n\n$$\n3+33+333+3333+\\cdots+\\underbrace{33333 \\ldots 3}_{2018 \\text { of them }} .\n$$", "role": "user"}], "answer": "\\dfrac{10^{2019"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\n12.407 A line perpendicular to the chord of a segment divides the chord in the ratio 1:4, and the arc - in the ratio $1: 2$. Find the cosine of the central angle subtended by this arc.", "role": "user"}], "answer": "-\\dfrac{23"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\nFor some positive integer $n$, a coin will be flipped $n$ times to obtain a sequence of $n$ heads and tails. For each flip of the coin, there is probability $p$ of obtaining a head and probability $1-p$ of obtaining a tail, where $0<p<1$ is a rational number.\nKim writes all $2^n$ possible sequences of $n$ heads and tails in two columns, with some sequences in the left column and the remaining sequences in the right column. Kim would like the sequence produced by the coin flips to appear in the left column with probability $1/2$.\nDetermine all pairs $(n,p)$ for which this is possible.", "role": "user"}], "answer": "(n, \\frac{1"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\n212. $\\int \\arctan x \\, dx$.", "role": "user"}], "answer": "x \\arctan x - \\frac{1"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\nExample 1. A pocket contains 10 white balls and 8 black balls. 4 balls are drawn from it. Find: (1) the probability of drawing exactly 2 white balls; (2) the probability of drawing at most 2 white balls.", "role": "user"}], "answer": "\\dfrac{21"}
{"messages": [{"content": "You are a mathematical reasoning expert. Solve problems with detailed step-by-step thinking and provide final answers in \\boxed{} format. Show all intermediate calculations and explain your reasoning clearly.\n\n# Task 1. (2 points)\n\nIn a class, each student has either 5 or 6 friends (friendship is mutual), and any two friends have a different number of friends in the class. What is the smallest number of students, greater than 0, that can be in the class?", "role": "user"}], "answer": "11"}
22 changes: 22 additions & 0 deletions samples/cli/finetuning/reinforcement/sample_finetuning_rft.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: rft-ft-cli-demo
description: Template to demonstrate reinforcement fine-tuning via CLI
model: o4-mini-2025-04-16
method:
type: reinforcement
reinforcement:
hyperparameters:
epochs: 3
batch_size: 8
learning_rate_multiplier: 1.0
beta: 0.5
compute_multiplier: 1.0
reasoning_effort: high
grader:
type: string_check
input: "{{sample.output_text}}"
reference: "{{item.target}}"
operation: eq
suffix: "rft-trained"
seed: 42
training_file: local:data/rft_training_set.jsonl
validation_file: local:data/rft_validation_set.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,14 @@
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Azure.AI.Projects" Version="2.0.0-beta.1" />
<PackageReference Include="Azure.Identity" Version="1.13.0" />
<PackageReference Include="Azure.AI.Projects" Version="1.2.0-alpha.20260128.1" />
<PackageReference Include="Azure.AI.Projects.OpenAI" Version="1.0.0-alpha.20260128.1" />
<PackageReference Include="Azure.Identity" Version="1.17.1" />
<PackageReference Include="DotNetEnv" Version="3.1.1" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="OpenAI" Version="2.*-*" />
</ItemGroup>

</Project>
Loading