Skip to content

Conversation

@SongXiaoXi
Copy link

@SongXiaoXi SongXiaoXi commented Dec 30, 2025

Description

Fixes proxy-side cleanup for sharedDevMems imported via cuMem to prevent GPU memory leaks across communicator lifetimes. The import path does not early-release handles, so teardown must perform proper unmap/addressFree and handle release.

Related Issues

N/A

Changes & Impact

  • Ensures sharedDevMems cuMem mappings are released during proxy shutdown.
  • This leak can be triggered via ncclCommAbort() when the communicator is using the NET transport/path (abort may bypass the normal finalize path, but proxy resources still need correct teardown).
  • No API changes; expected behavior is cleanup-only.

Performance Impact

Not measured; no expected runtime impact beyond cleanup on teardown.
Testing: Not run (not requested).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant