Feat: wait kernel #41

niehao100 · 2025-11-06T04:07:02Z

Add CUDA Graph Support for Push-Pull Operations with Wait Kernel

Summary

This MR introduces CUDA Graph support for push-pull operations by implementing a wait kernel mechanism that enables synchronization between CPU and GPU threads in a CUDA Graph-compatible manner. The implementation consists of two main commits that progressively add the wait kernel functionality and enhance it with CUDA Graph support.

Changes

Core Features:

Implemented CUDA kernels for flag-based synchronization:
- write_flag_kernel: Writes a sequence number to a flag with system-level memory fence
- wait_flag_kernel: Waits until a flag reaches a target sequence number
Added utility functions:
- map_pinned_tensor: Maps pinned host memory to device memory for zero-copy access
- write_flag: Host interface for writing flags on GPU
- wait_flag: Host interface for waiting on flags on GPU
Refactored header files:
- Renamed util.h to util.hpp for consistency
- Added conditional compilation for CUDA-dependent code
Modified kernel signatures to use torch::Tensor instead of int64_t for sequence numbers:
- This enables CUDA Graph capture since Python integers cannot be captured in graphs
- Updated write_flag and wait_flag to accept tensor-based sequence numbers
Added seq_add_one kernel for incrementing sequence numbers within CUDA Graph
Enhanced push_pull function:
- Added optional need_event parameter (default: true)
- Allows disabling event recording when used inside CUDA Graph
- Enables more efficient graph execution without unnecessary event overhead

Testing

Run the test suite:

ROLE=joint  RNIC=brainpf0  BIN=../fserver/test_kernel_wait bash tests/fserver/run_single_gpu.sh

niehao100 added 3 commits November 6, 2025 18:02

Add wait kernel impl

6c98756

Add push_pull/wait kernel graph support

7388307

Lint Code

2b7f86c

niehao100 force-pushed the feat/wait-kernel branch from 90c6682 to 2b7f86c Compare November 6, 2025 10:04

niehao100 changed the title ~~Feat/wait kernel~~ Feat: wait kernel Nov 13, 2025

niehao100 merged commit 740ac12 into stepfun-ai:main Nov 13, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: wait kernel #41

Feat: wait kernel #41

Uh oh!

niehao100 commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat: wait kernel #41

Feat: wait kernel #41

Uh oh!

Conversation

niehao100 commented Nov 6, 2025

Add CUDA Graph Support for Push-Pull Operations with Wait Kernel

Summary

Changes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant