Custom rl loss patch 2 (with key detection) by mzio · Pull Request #4 · thinking-machines-lab/tinker

mzio · 2025-10-10T05:47:22Z

Main issue: High-level, there seems to be a conflict between how a user would specify an RL loss and the required Datum loss_fn_inputs, and how this gets processed in training_client (where it expects supervised learning loss_fn_inputs).

tinker/src/tinker/lib/public_interfaces/training_client.py

Lines 259 to 274 in 9ba155a

    
           @capture_exceptions(fatal=True) 
        
           async def forward_backward_custom_async( 
        
               self, data: List[types.Datum], loss_fn: CustomLossFnV1 
        
           ) -> APIFuture[types.ForwardBackwardOutput]: 
        
               import torch 
        
               # First do a forward pass and get logprobs 
        
               forward_future = await self.forward_async(data, "cross_entropy") 
        
               forward_result = await forward_future.result_async() 
        
               logprobs_list: List[torch.Tensor] = [] 
        
               for out in forward_result.loss_fn_outputs: 
        
                   logprob = torch.tensor(out["logprobs"].data).clone().detach().requires_grad_(True) 
        
                   logprobs_list.append(logprob) 
        
               # Now apply user-provided function 
        
               loss, metrics = loss_fn(data, logprobs_list)

Solution here is to instead try to detect if the user's using an RL loss based on the keys in the first Datum.loss_fn_inputs

If there's advantages, it also asserts for the other expected keys (target_tokens, logprobs)
And then computes the on-policy logprobs with forward_future = await self.forward_async(data, "importance_sampling")

patch custom RL loss with key detection

882fefd

mzio mentioned this pull request Oct 10, 2025

custom RL losses may be borked? #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom rl loss patch 2 (with key detection)#4

Custom rl loss patch 2 (with key detection)#4
mzio wants to merge 1 commit intothinking-machines-lab:mainfrom
mzio:custom_rl_loss_patch_2

mzio commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	@capture_exceptions(fatal=True)
	async def forward_backward_custom_async(
	self, data: List[types.Datum], loss_fn: CustomLossFnV1
	) -> APIFuture[types.ForwardBackwardOutput]:
	import torch

	# First do a forward pass and get logprobs
	forward_future = await self.forward_async(data, "cross_entropy")
	forward_result = await forward_future.result_async()
	logprobs_list: List[torch.Tensor] = []
	for out in forward_result.loss_fn_outputs:
	logprob = torch.tensor(out["logprobs"].data).clone().detach().requires_grad_(True)
	logprobs_list.append(logprob)

	# Now apply user-provided function
	loss, metrics = loss_fn(data, logprobs_list)

Conversation

mzio commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant