[BugFix][Async] Clear spec tokens for requests re-entering input batch #30796

izhuhaoran · 2025-12-16T16:00:33Z

Purpose

In async scheduling + spec, requests (re-entering input batch) do not have pre-step draft tokens (since they were not running in the previous step). Therefore, any scheduled_spec_decode_tokens assigned by the scheduler for these requests are essentially invalid placeholders. Retaining them leads to unnecessary computation and potential unexpected behavior.

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

chatgpt-codex-connector · 2025-12-16T16:00:39Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request addresses a bug in asynchronous scheduling with speculative decoding where preempted or resumed requests could have invalid speculative tokens. These requests don't have pre-step draft tokens, so any scheduled speculative tokens are incorrect.

The change correctly identifies these requests (those not in the persistent batch) and, when in async scheduling mode, clears any associated speculative tokens from the scheduler_output. This is done by removing the request ID from scheduled_spec_decode_tokens and adjusting total_num_scheduled_tokens and num_scheduled_tokens accordingly.

The implementation is clean and directly solves the described problem, preventing unnecessary computation and potential downstream errors. The logic appears sound and consistent with the existing codebase. I have no further suggestions.

izhuhaoran · 2025-12-16T16:02:43Z

@njhill @benchislett Could you please review this PR when you have time ?

benchislett · 2025-12-16T22:48:18Z

I don't fully understand the fix. What scenario is causing all these conditions to trigger? What behaviour is changing with this patch?

izhuhaoran · 2025-12-17T05:47:10Z

I don't fully understand the fix. What scenario is causing all these conditions to trigger? What behaviour is changing with this patch?

Hi @benchislett, thanks for the review.

This PR addresses a corner case in Async Scheduling + Speculative Decoding. I observed this behavior sporadically in my own deployment. Since it is difficult to provide a deterministic reproduction script due to the specific timing and load conditions required, I will use a concrete example to illustrate the scenario.

Configuration:

max_num_batched_tokens = 40
num_spec_tokens = 3

Timeline:

Step N: Requests 0-10 are in the running queue.
Step N+1:
- The scheduler processes the running queue [0, 1, ..., 9, 10].
- Requests 0-9 are scheduled. They consume 10 reqs * 4 tokens (1 + 3) = 40 tokens. The budget is full.
- Request 10 is skipped (unscheduled) due to the budget limit.
- Consequence: In GPUModelRunner, Request 10 is removed from the input_batch because it wasn't scheduled for this step. It loses its "active" status in the worker.
Step N+2:
- Suppose Request 0 finishes (e.g., reaches max length), freeing up budget.
- Request 10 is now scheduled.
- The Scheduler assigns scheduled_spec_decode_tokens to Request 10 (as it is technically a "running" request).
- The Conflict: In GPUModelRunner, Request 10 is treated as a "resumed" request (req_index is None) because it was missing from the input_batch in Step N+1. Since it didn't run in Step N+1, it has no cached draft tokens to verify.

This PR:
We can safely clear the scheduled_spec_decode_tokens for Request 10 at this step. Of course, even if we don’t clear it, nothing will break — it would only result in incorrect draft token IDs being computed (which would still be properly rejected by the rejection sampler). However, clearing it avoids unnecessary computation and helps prevent any unexpected bugs later on.

Why this PR is safe:
Since req_index is None and req in scheduled_spec_decode_tokens, we know the request is re-entering the execution batch. It has no valid draft tokens from the immediate previous step to verify. Clearing the spec tokens is safe. This change acts as a defensive measure. In normal continuous decoding, this logic is skipped, so it does not affect standard behavior.

izhuhaoran · 2025-12-18T09:59:19Z

Update:
Currently, we only deepcopy the scheduler_output when the previous step’s _draft_token_ids is None. Removing scheduled_spec_decode_tokens here may affect the scheduler-side scheduled_spec_decode_tokens, which would then impact how update_from_output updates fields like num_computed_tokens and num_output_placeholders.

This PR could cause the scheduler’s and worker’s cached states for a request to become inconsistent, making issues more likely.

Since not removing the tokens does not actually cause any functional problems, we will temporarily drop/abandon this PR.

clear preempted/resumed reqs in scheduled_spec_decode_tokens

fe4eefb

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

izhuhaoran changed the title ~~[BugFix][Async] clear spec tokens for preempted or resumed reqs in async + spec decode~~ [BugFix][Async] clear spec tokens for preempted or resumed reqs in async Dec 16, 2025

mergify bot added the v1 label Dec 16, 2025

gemini-code-assist bot reviewed Dec 16, 2025

View reviewed changes

izhuhaoran changed the title ~~[BugFix][Async] clear spec tokens for preempted or resumed reqs in async~~ [BugFix][Async] Clear spec tokens for requests re-entering input batch in Async Dec 17, 2025

izhuhaoran changed the title ~~[BugFix][Async] Clear spec tokens for requests re-entering input batch in Async~~ [BugFix][Async] Clear spec tokens for requests re-entering input batch Dec 17, 2025

izhuhaoran marked this pull request as draft December 17, 2025 18:31

izhuhaoran closed this Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix][Async] Clear spec tokens for requests re-entering input batch #30796

[BugFix][Async] Clear spec tokens for requests re-entering input batch #30796

Uh oh!

izhuhaoran commented Dec 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

izhuhaoran commented Dec 16, 2025

Uh oh!

benchislett commented Dec 16, 2025

Uh oh!

izhuhaoran commented Dec 17, 2025

Uh oh!

izhuhaoran commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[BugFix][Async] Clear spec tokens for requests re-entering input batch #30796

[BugFix][Async] Clear spec tokens for requests re-entering input batch #30796

Uh oh!

Conversation

izhuhaoran commented Dec 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

chatgpt-codex-connector bot commented Dec 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

izhuhaoran commented Dec 16, 2025

Uh oh!

benchislett commented Dec 16, 2025

Uh oh!

izhuhaoran commented Dec 17, 2025

Uh oh!

izhuhaoran commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

izhuhaoran commented Dec 16, 2025 •

edited by github-actions bot

Loading