Skip to content

Conversation

@RolaoDenthu
Copy link

@RolaoDenthu RolaoDenthu commented Dec 21, 2025

What does this PR do ?

Add comprehensive test coverage for SGLang generation backend, including functional tests, unit tests, and nightly tests.

  • Functional Test (tests/functional/grpo_sglang.sh): Quick validation of SGLang-based GRPO training
  • Unit Tests (tests/unit/models/generation/test_sglang_generation.py): unit tests covering:
    • Basic configuration validation
    • Policy generation and tensor parallelism
    • Worker seed behavior for RLHF diversity
    • HTTP server direct API access
    • Weight updates with DTensor policy (colocated mode)
    • Prefix cache reset after weight updates
  • Nightly Test (tests/test_suites/llm/grpo-qwen3-0.6b-1n8g-sglang.sh): End-to-end convergence test for SGLang backend

Usage

  • You can potentially add a usage example below
# Run functional test
uv add coverage
bash tests/functional/grpo_sglang.sh

# Run unit tests
uv sync --extra sglang --group test
uv run python -m pytest tests/unit/models/generation/test_sglang_generation.py -v --sglang-only

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

Release Notes

  • New Features

    • Distributed generation engine using SGLang backend with HTTP weight streaming and multi-GPU support.
  • Configuration

    • New YAML configuration templates for SGLang-based experiments with customizable generation parameters.
  • Tests

    • Comprehensive test coverage for SGLang generation, including tensor parallelism, batching, and dynamic weight updates.

✏️ Tip: You can customize this high-level summary in your review settings.

PrinsYin and others added 30 commits December 6, 2025 21:12
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
…a server

Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
…p servers

Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
sglang: add 1B example
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
- Convert SGLangConfig from regular class to TypedDict inheriting GenerationConfig
- Align structure with VllmConfig pattern for consistency
- Mark all fields as NotRequired for backward compatibility
- Add sglang_kwargs field for additional ServerArgs parameters
- Add type casting in grpo.py for type safety

This maintains backward compatibility while aligning with the existing
generation config structure pattern.

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Night <32424487+PrinsYin@users.noreply.github.com>
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
@RolaoDenthu
Copy link
Author

@guyueh1 Hi I’ve fixed the environment and it should now be able to run with the current uv lock. Could you please restart the test?

@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 7, 2026
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 8, 2026
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

⚠️ File Consistency Check

Check based on commit: 570f996 (PR #1674 from add-tests)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@terrykong
Copy link
Contributor

hey @RolaoDenthu . i'm trying to get the examples in this PR to run and running into some issues. I'll submit a PR against your branch tomorrow with some fixes

Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
@guyueh1 guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 8, 2026
@terrykong
Copy link
Contributor

@RolaoDenthu PR for your review: RolaoDenthu#1

Also when i run:

uv run bash tests/functional/grpo_sglang.sh

I get

(IsolatedWorkerInitializer pid=2692982) /opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/layers/quantization/awq.py:62: UserWarning: Only CUDA and HIP support AWQ currently.
(IsolatedWorkerInitializer pid=2692982)   warnings.warn(f"Only CUDA and HIP support AWQ currently.")
Initializing sglang_policy workers: 100%|██████████| 1/1 [00:10<00:00, 10.91s/worker]
Traceback (most recent call last):
  File "/workspaces/nemo-rl/examples/run_grpo_math.py", line 260, in <module>
    main()
  File "/workspaces/nemo-rl/examples/run_grpo_math.py", line 192, in main
    ) = setup(config, tokenizer, dataset, val_dataset)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/nemo-rl/nemo_rl/algorithms/grpo.py", line 615, in setup
    policy_generation, policy = initialize_generation_with_policy(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/nemo-rl/nemo_rl/algorithms/grpo.py", line 545, in initialize_generation_with_policy
    policy_generation, generation_time = init_generation_fn()
                                         ^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/nemo-rl/nemo_rl/algorithms/grpo.py", line 489, in init_sglang
    pg = SGLangGeneration(cluster=inference_cluster, config=generation_config)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/nemo-rl/nemo_rl/models/generation/sglang/sglang_generation.py", line 130, in __init__
    self.worker_group = RayWorkerGroup(
                        ^^^^^^^^^^^^^^^
  File "/workspaces/nemo-rl/nemo_rl/distributed/worker_groups.py", line 398, in __init__
    self._create_workers_from_bundle_indices(
  File "/workspaces/nemo-rl/nemo_rl/distributed/worker_groups.py", line 593, in _create_workers_from_bundle_indices
    workers = ray.get(worker_refs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/worker.py", line 968, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ImportError): ray::IsolatedWorkerInitializer.create_worker() (pid=2692982, ip=172.17.0.2, actor_id=633d878c3ae6e4c75bca36df01000000, repr=<nemo_rl.distributed.worker_groups.IsolatedWorkerInitializer object at 0x7db5bc0e9760>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/nemo-rl/nemo_rl/distributed/worker_groups.py", line 169, in create_worker
    module = importlib.import_module(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/workspaces/nemo-rl/nemo_rl/models/generation/sglang/sglang_worker.py", line 26, in <module>
    from sglang.srt.entrypoints.http_server import launch_server
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/entrypoints/http_server.py", line 51, in <module>
    from sglang.srt.entrypoints.engine import _launch_subprocesses
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/entrypoints/engine.py", line 43, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/managers/data_parallel_controller.py", line 39, in <module>
    from sglang.srt.managers.scheduler import run_scheduler_process
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 37, in <module>
    from sglang.srt.configs.model_config import ModelConfig
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/configs/model_config.py", line 32, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/layers/quantization/__init__.py", line 61, in <module>
    from sglang.srt.layers.quantization.w4afp8 import W4AFp8Config
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/layers/quantization/w4afp8.py", line 24, in <module>
    from sglang.srt.layers.moe.cutlass_w4a8_moe import cutlass_w4a8_moe
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sglang/srt/layers/moe/cutlass_w4a8_moe.py", line 6, in <module>
    from sgl_kernel import (
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sgl_kernel/__init__.py", line 5, in <module>
    common_ops = _load_architecture_specific_ops()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sgl_kernel/load_utils.py", line 188, in _load_architecture_specific_ops
    raise ImportError(error_msg)
ImportError: 
[sgl_kernel] CRITICAL: Could not load any common_ops library!

Attempted locations:
1. Architecture-specific pattern: /opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.* - found files: ['/opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.abi3.so']
2. Fallback pattern: /opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sgl_kernel/common_ops.* - found files: []
3. Standard Python import: common_ops - failed

GPU Info:
- Compute capability: None
- Expected variant: CPU/No GPU detected (using precise math)

Please ensure sgl_kernel is properly installed with:
pip install --upgrade sgl_kernel

Error details from previous import attempts:
- ImportError: /opt/ray_venvs/nemo_rl.models.generation.sglang.sglang_worker.SGLangGenerationWorker/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.abi3.so: undefined symbol: _ZNK3c106SymInt6sym_neERKS0_
- ModuleNotFoundError: No module named 'common_ops'
ERROR:nemo_rl.models.generation.sglang.sglang_generation:sglang_generation.py:344: Error during SGLang policy shutdown: 'SGLangGeneration' object has no attribute 'worker_group'

is sgl_kernel supposed to be a dependency?

@terrykong
Copy link
Contributor

@RolaoDenthu From poking around in sglang's source I think this is due to the sgl-kernel bdist wheel being built against torch 2.8. We currently need to be on torch 2.9. I'm looking at sglang 0.5.5, and the bdist wheel seems to be built against torch 2.9 despite the sglang sdist wheel specifying torch 2.8; not sure if I'm reading that correctly

@terrykong
Copy link
Contributor

terrykong commented Jan 9, 2026

@RolaoDenthu also another challenge we need to overcome is the fact that the extras automodel and mcore include vllm, but sglang has many conflicting dependencies with vllm. both inf backends and training backends are put together solely b/c there's some logic within each generation library for weight transfer we want the training backend to use. we probably need to figure out how to generalize https://github.com/NVIDIA-NeMo/RL/pull/1638/changes (cc @yuki-97 ) . As it stands dtensor v2 basically works for sglang now, but not for vllm b/c you've changed the py_executable for it

@RolaoDenthu
Copy link
Author

@RolaoDenthu From poking around in sglang's source I think this is due to the sgl-kernel bdist wheel being built against torch 2.8. We currently need to be on torch 2.9. I'm looking at sglang 0.5.5, and the bdist wheel seems to be built against torch 2.9 despite the sglang sdist wheel specifying torch 2.8; not sure if I'm reading that correctly

Thank you! I have checked before and sgl-kernel support torch 2.8.0 and after is 2.9.1. So I can't find an appropriate version as nemo ask for torch 2.9.0.

@terrykong
Copy link
Contributor

terrykong commented Jan 9, 2026

@RolaoDenthu we took a look and it seems possible to decouple sglang from dtensor v2 (from consulting with @yuki-97 ). that would help a good deal with the complexity. I'll give that a try (will need to copy some sglang source needed to do so).

I think the bigger issue is the sgl-kernel library. Is it possible to build it from source and do a VCS install? I did see one single mention of torch 2.9.0, but it was only for cuda 13: https://github.com/sgl-project/sglang/blob/0c006b8809cd99e1f95926401a2823dd952641c8/sgl-kernel/build.sh#L23-L25. If we can VCS install, then we'd be able to compile it with whatever torch version nemo-rl is currently on

@RolaoDenthu
Copy link
Author

RolaoDenthu commented Jan 9, 2026

@RolaoDenthu we took a look and it seems possible to decouple sglang from dtensor v2 (from consulting with @yuki-97 ). that would help a good deal with the complexity. I'll give that a try (will need to copy some sglang source needed to do so).

I think the bigger issue is the sgl-kernel library. Is it possible to build it from source and do a VCS install? I did see one single mention of torch 2.9.0, but it was only for cuda 13: https://github.com/sgl-project/sglang/blob/0c006b8809cd99e1f95926401a2823dd952641c8/sgl-kernel/build.sh#L23-L25. If we can VCS install, then we'd be able to compile it with whatever torch version nemo-rl is currently on

I checked with others in the sglang community, and the feedback was that there are known issues with torch 2.9.0 + cudnn. If we really need to stay on 2.9.0 then we can only build sgl-kernel from source. I will try on this.

@terrykong
Copy link
Contributor

Thanks for checking. I'll give it a try as well.

Do you have a link to the issues for my understanding?

We can bump to 2.9.1 as well, but usually we encounter many issues when upgrading torch, so maybe we can first see if there's a way to not upgrade torch first before resorting to the upgrade

@RolaoDenthu
Copy link
Author

pytorch/pytorch#166643 I think this is the issue.

@guyueh1
Copy link
Contributor

guyueh1 commented Jan 10, 2026

Regarding sgl_kernel problem, I think @RolaoDenthu should try see if compile-from-source works for sgl_kernel + torch 2.9.0; we can't globally bump torch to 2.9.1, until vllm releases v0.14.0 which is the first that's compatible with torch 2.9.1.

@terrykong Regarding vllm and sglang dependency co-existing in the automodel and mcore venv, this is indeed a tricky problem. A nice refactor would be to decouple the weight_sync functionality to different actors, for instance we define class VllmDTensorv2WeightSyncWorker, class SglDTensorv2WeightSyncWorker, the first has its own venv that includes automodel + vllm, the latter has its own venv that includes automodel + sgl. If we want to do this, we need to first merge a refactor PR in nemorl that settles these interfaces; but if there is no good way to make sgl and vllm compatible in one environment, then this seems the only way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests community-request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants