Surface model reasoning as delta.reasoning_content in streaming & sync responses#10
Merged
notabd7-deepshard merged 1 commit intodeepshard:mainfrom Feb 12, 2026
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_StreamFilterto capture thinking/reasoning content instead of silently dropping it, emitting it asdelta.reasoning_contentin streaming SSE chunks andmessage.reasoning_contentin sync responses<think>tag split across gRPC chunk boundaries</think>)<think>blocks mid-response\nafter<think>tag stripped from reasoning outputreasoning_contentreasoning_contentare unaffected (additive field)Problem
When using a reasoning model through the proxy, the model's chain-of-thought is completely stripped from responses. Clients that want to display or log reasoning have no way to access it.
To reproduce:
<think>...</think>content is silently dropped. Onlydelta.contentchunks appear — no reasoning is surfaced anywhere in the streaming response.debug.reasoningfield and only when--debugis enabled.Solution
Instead of discarding thinking content in
_StreamFilter, capture it and emit it asdelta.reasoning_content(streaming) /message.reasoning_content(sync), matching the convention used by DeepSeek and OpenAI for reasoning models.After this change, the same curl now produces:
Non-streaming responses include both fields on the message:
{ "message": { "role": "assistant", "content": "4", "reasoning_content": "Okay, the user is asking..." } }_StreamFilterchanges:feed()andfinalize()now return(visible, reasoning)tuples instead of a single string. Phase 1 (initial CoT block) and Phase 2 (mid-stream<think>blocks) both capture reasoning instead of discarding it.Testing
Tested against Qwen3-30B-A3B:
reasoning_contentchunks flow, clean transition tocontent, no tag leaksfinish_reason+data: [DONE]emitted correctly