-
Notifications
You must be signed in to change notification settings - Fork 5
[tests, cuda] test: strengthen VRAM target and release assertions for keep sessions #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
9692d71
test(cuda): strengthen VRAM consumption assertions
Wangmerlyn 7d1b46c
Update tests/cuda_controller/test_context_manager.py
Wangmerlyn 6551fa1
Update tests/cuda_controller/test_keep_and_release.py
Wangmerlyn 56a3951
test(cuda): factor polling helper and dedupe VRAM checks
Wangmerlyn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,28 +1,85 @@ | ||
| import time | ||
| import pytest | ||
| import torch | ||
|
|
||
| from keep_gpu.single_gpu_controller.cuda_gpu_controller import CudaGPUController | ||
| from tests.polling import wait_until | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| not torch.cuda.is_available(), | ||
| reason="Only run CUDA tests when CUDA is available", | ||
| ) | ||
| def test_cuda_controller_context_manager(): | ||
| """Validate VRAM target consumption during keep and recovery after release.""" | ||
| ctrl = CudaGPUController( | ||
| rank=torch.cuda.device_count() - 1, | ||
| interval=0.05, | ||
| vram_to_keep="8MB", | ||
| vram_to_keep="64MB", | ||
| relu_iterations=64, | ||
| ) | ||
|
|
||
| torch.cuda.set_device(ctrl.rank) | ||
| torch.cuda.empty_cache() | ||
| torch.cuda.synchronize() | ||
|
|
||
| target_bytes = int(ctrl.vram_to_keep) * 4 # float32 bytes | ||
| free_bytes, _ = torch.cuda.mem_get_info(ctrl.rank) | ||
| if free_bytes < int(target_bytes * 1.2): | ||
| pytest.skip( | ||
| f"Insufficient free VRAM for assertion test: need ~{target_bytes}, have {free_bytes}" | ||
| ) | ||
|
|
||
| alloc_tolerance = 8 * 1024 * 1024 | ||
| reserve_tolerance = 16 * 1024 * 1024 | ||
| before_reserved = torch.cuda.memory_reserved(ctrl.rank) | ||
| before_allocated = torch.cuda.memory_allocated(ctrl.rank) | ||
|
|
||
| with ctrl: | ||
| time.sleep(0.3) | ||
| assert ctrl._thread and ctrl._thread.is_alive() | ||
| during_reserved = torch.cuda.memory_reserved(ctrl.rank) | ||
| assert during_reserved >= before_reserved | ||
| peak_alloc_delta = 0 | ||
| peak_reserved_delta = 0 | ||
|
|
||
| def _target_reached() -> bool: | ||
| nonlocal peak_alloc_delta, peak_reserved_delta | ||
| alloc_delta = max( | ||
| 0, torch.cuda.memory_allocated(ctrl.rank) - before_allocated | ||
| ) | ||
| reserved_delta = max( | ||
| 0, torch.cuda.memory_reserved(ctrl.rank) - before_reserved | ||
| ) | ||
| peak_alloc_delta = max(peak_alloc_delta, alloc_delta) | ||
| peak_reserved_delta = max(peak_reserved_delta, reserved_delta) | ||
| # allocated should track payload; reserved may be larger due allocator blocks | ||
| return ( | ||
| alloc_delta >= int(target_bytes * 0.95) | ||
| and reserved_delta >= alloc_delta | ||
| ) | ||
|
|
||
| reached_target = wait_until(_target_reached, timeout_s=3.0) | ||
|
|
||
| assert not (ctrl._thread and ctrl._thread.is_alive()) | ||
| assert reached_target, ( | ||
| f"VRAM target not reached. target={target_bytes}, " | ||
| f"peak_alloc_delta={peak_alloc_delta}, peak_reserved_delta={peak_reserved_delta}" | ||
| ) | ||
|
|
||
| alloc_delta_after = -1 | ||
| reserved_delta_after = -1 | ||
|
|
||
| def _released() -> bool: | ||
| nonlocal alloc_delta_after, reserved_delta_after | ||
| alloc_after = torch.cuda.memory_allocated(ctrl.rank) | ||
| reserved_after = torch.cuda.memory_reserved(ctrl.rank) | ||
| alloc_delta_after = max(0, alloc_after - before_allocated) | ||
| reserved_delta_after = max(0, reserved_after - before_reserved) | ||
| return ( | ||
| alloc_delta_after <= alloc_tolerance | ||
| and reserved_delta_after <= reserve_tolerance | ||
| and not (ctrl._thread and ctrl._thread.is_alive()) | ||
| ) | ||
|
|
||
| released = wait_until(_released, timeout_s=3.0) | ||
|
|
||
| assert released, ( | ||
| "VRAM did not return near baseline after release. " | ||
| f"alloc_delta_after={alloc_delta_after}, reserved_delta_after={reserved_delta_after}" | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| import time | ||
| from typing import Callable | ||
|
|
||
|
|
||
| def wait_until( | ||
| predicate: Callable[[], bool], | ||
| timeout_s: float = 3.0, | ||
| interval_s: float = 0.05, | ||
| ) -> bool: | ||
| """Poll predicate until it returns True or timeout is reached.""" | ||
| deadline = time.time() + timeout_s | ||
| while time.time() < deadline: | ||
| if predicate(): | ||
| return True | ||
| time.sleep(interval_s) | ||
| return False |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_cuda_controller_respects_vram_target_during_keepstarts the background keep loop withctrl.keep(), but cleanup is only done at the end of the happy path. If any assertion beforectrl.release()fails (for example the 3s target wait timing out on a busy CI GPU), the daemon thread keeps running and continues holding VRAM, which can cascade into unrelated CUDA test failures in the same pytest process. Usetry/finally(orwith ctrl:) so release is guaranteed on failure paths.Useful? React with 👍 / 👎.