Surface model reasoning as delta.reasoning_content in streaming & sync responses by robsltd · Pull Request #10 · deepshard/truffile

robsltd · 2026-02-11T21:30:55Z

Summary

Rework _StreamFilter to capture thinking/reasoning content instead of silently dropping it, emitting it as delta.reasoning_content in streaming SSE chunks and message.reasoning_content in sync responses
Follows the DeepSeek / OpenAI convention for surfacing chain-of-thought in OpenAI-compatible APIs
<think> tag split across gRPC chunk boundaries
Stream ending mid-think (before </think>)
Multiple <think> blocks mid-response
Leading \n after <think> tag stripped from reasoning output
Non-reasoner models never emit reasoning_content
Existing clients that don't read reasoning_content are unaffected (additive field)

Problem

When using a reasoning model through the proxy, the model's chain-of-thought is completely stripped from responses. Clients that want to display or log reasoning have no way to access it.

To reproduce:

Start the proxy with the reasoning model loaded

Send a streaming request:

curl -N http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","stream":true,"messages":[{"role":"user","content":"What is 2+2?"}]}'

Observe: all <think>...</think> content is silently dropped. Only delta.content chunks appear — no reasoning is surfaced anywhere in the streaming response.
For non-streaming, reasoning is only available via the non-standard debug.reasoning field and only when --debug is enabled.

Solution

Instead of discarding thinking content in _StreamFilter, capture it and emit it as delta.reasoning_content (streaming) / message.reasoning_content (sync), matching the convention used by DeepSeek and OpenAI for reasoning models.

After this change, the same curl now produces:

data: {"choices":[{"delta":{"reasoning_content":"Okay, the user is asking..."}}]}
data: {"choices":[{"delta":{"reasoning_content":"...simple arithmetic..."}}]}
data: {"choices":[{"delta":{"content":"4"}}]}
data: {"choices":[{"delta":{},"finish_reason":"stop"}]}

Non-streaming responses include both fields on the message:

{
  "message": {
    "role": "assistant",
    "content": "4",
    "reasoning_content": "Okay, the user is asking..."
  }
}

_StreamFilter changes: feed() and finalize() now return (visible, reasoning) tuples instead of a single string. Phase 1 (initial CoT block) and Phase 2 (mid-stream <think> blocks) both capture reasoning instead of discarding it.

Testing

Tested against Qwen3-30B-A3B:

Streaming: reasoning_content chunks flow, clean transition to content, no tag leaks
Non-streaming: both fields present and clean
finish_reason + data: [DONE] emitted correctly

…c responses

Surface model reasoning as delta.reasoning_content in streaming & syn…

0acaf9f

…c responses

notabd7-deepshard merged commit 599e9fd into deepshard:main Feb 12, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Surface model reasoning as delta.reasoning_content in streaming & sync responses#10

Surface model reasoning as delta.reasoning_content in streaming & sync responses#10
notabd7-deepshard merged 1 commit intodeepshard:mainfrom
robsltd:feat/reasoning-content

robsltd commented Feb 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

robsltd commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robsltd commented Feb 11, 2026 •

edited

Loading