Device handling improvements. #100

romerojosh · 2025-12-08T18:32:49Z

In related development to #99, this PR makes some additional device handling improvements in the TorchFort backend. In particular, this PR adds:

Usage of CUDAGuard objects within supervised and RL functions to properly set/unset the current CUDA device to expected model device. This is not fixing any current issue in the implementation but better sets up the code for direct CUDA runtime call utilization (e.g. CUDA graph capture/replay).
Checks on the user-supplied CUDA stream to ensure it is on the same device as the model.

Signed-off-by: Josh Romero <joshr@nvidia.com>

…er version. Signed-off-by: Josh Romero <joshr@nvidia.com>

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh · 2025-12-08T18:47:45Z

/build_and_test

github-actions · 2025-12-08T18:47:54Z

🚀 Build workflow triggered! View run

github-actions · 2025-12-08T19:00:02Z

✅ Build workflow passed! View run

romerojosh added 6 commits December 8, 2025 10:15

Better handling of non-default GPU/multi-GPU per process use cases.

5525970

Signed-off-by: Josh Romero <joshr@nvidia.com>

Add device context switch checks to supervised learning tests.

281a6d6

Signed-off-by: Josh Romero <joshr@nvidia.com>

Adding tests. Conditional use of cuStreamGetDevice based on CUDA driv…

12fce11

…er version. Signed-off-by: Josh Romero <joshr@nvidia.com>

Update tests.

83ebe32

Signed-off-by: Josh Romero <joshr@nvidia.com>

Update tests.

0963170

Signed-off-by: Josh Romero <joshr@nvidia.com>

Formatting fixes.

9e90507

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh requested a review from azrael417 December 8, 2025 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Device handling improvements. #100

Device handling improvements. #100

Uh oh!

romerojosh commented Dec 8, 2025

Uh oh!

romerojosh commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Device handling improvements. #100

Are you sure you want to change the base?

Device handling improvements. #100

Uh oh!

Conversation

romerojosh commented Dec 8, 2025

Uh oh!

romerojosh commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants