Release/v1.0 #225

henrytsui000 · 2025-12-30T07:32:45Z

Stabilize training for v9-t, and fix some previous bugs.

Copilot

Pull request overview

This PR prepares the codebase for v1.0 release by stabilizing training for v9-t models and fixing several bugs. The changes focus on implementing gradient accumulation with warmup, fixing a widespread typo, and enhancing distributed training support.

Key changes include:

Implementation of a new GradientAccumulation callback to dynamically adjust gradient accumulation during training warmup
Updates to the EMA callback to properly account for gradient accumulation steps
Correction of the "quite" → "quiet" typo throughout configuration and documentation

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
yolo/utils/model_utils.py	Added `GradientAccumulation` callback class for dynamic gradient accumulation during warmup; updated `EMA` callback to work with gradient accumulation by tracking batch steps
yolo/utils/logging_utils.py	Integrated `GradientAccumulation` into setup; corrected "quite" typo; changed batch progress description; removed `clear_live()` call
yolo/lazy.py	Added configurable accelerator and device settings; enabled `sync_batchnorm` for distributed training; corrected "quite" typo
yolo/config/task/train.yaml	Added `equivalent_batch_size` parameter to support gradient accumulation configuration
yolo/config/general.yaml	Changed device configuration from fixed `0` to `auto` and added accelerator setting
yolo/config/config.py	Added `equivalent_batch_size` field to `DataConfig` and `accelerator` field to `Config`
tests/test_utils/test_bounding_box_utils.py	Removed `.to(device)` call from model creation in test (model doesn't need to be on device for this test)
tests/conftest.py	Removed `equivalent_batch_size` from test configuration to prevent `GradientAccumulation` callback activation during tests
docs/HOWTO.md	Corrected "quite" → "quiet" in example command
README.md	Corrected "quite" → "quiet" in documentation; removed WIP and PWC badges
.github/workflows/deploy.yaml	Corrected "quite" → "quiet" in workflow command

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T07:36:27Z

yolo/utils/model_utils.py

+class GradientAccumulation(Callback):
+    def __init__(self, data_cfg: DataConfig, scheduler_cfg: SchedulerConfig):
+        super().__init__()
+        self.equivalent_batch_size = data_cfg.equivalent_batch_size
+        self.actual_batch_size = data_cfg.batch_size
+        self.warmup_epochs = getattr(scheduler_cfg.warmup, "epochs", 0)
+        self.current_batch = 0
+        self.max_accumulation = 1
+        self.warmup_batches = 0
+        logger.info(":arrows_counterclockwise: Enable Gradient Accumulation")


The GradientAccumulation class lacks a docstring explaining its purpose, parameters, and behavior. Consider adding documentation to describe how this callback manages gradient accumulation during training warmup.

Copilot · 2025-12-30T07:36:27Z

yolo/utils/model_utils.py

+class GradientAccumulation(Callback):
+    def __init__(self, data_cfg: DataConfig, scheduler_cfg: SchedulerConfig):
+        super().__init__()
+        self.equivalent_batch_size = data_cfg.equivalent_batch_size
+        self.actual_batch_size = data_cfg.batch_size
+        self.warmup_epochs = getattr(scheduler_cfg.warmup, "epochs", 0)
+        self.current_batch = 0
+        self.max_accumulation = 1
+        self.warmup_batches = 0
+        logger.info(":arrows_counterclockwise: Enable Gradient Accumulation")
+
+    def setup(self, trainer: "Trainer", pl_module: "LightningModule", stage: str) -> None:
+        effective_batch_size = self.actual_batch_size * trainer.world_size
+        self.max_accumulation = max(1, round(self.equivalent_batch_size / effective_batch_size))
+        batches_per_epoch = int(len(pl_module.train_loader) / trainer.world_size)
+        self.warmup_batches = int(self.warmup_epochs * batches_per_epoch)
+
+    def on_train_epoch_start(self, trainer: "Trainer", pl_module: "LightningModule") -> None:
+        self.current_batch = trainer.global_step
+
+    def on_train_batch_start(self, trainer: "Trainer", pl_module: "LightningModule", *args, **kwargs) -> None:
+        if self.current_batch < self.warmup_batches:
+            current_accumulation = round(lerp(1, self.max_accumulation, self.current_batch, self.warmup_batches))
+        else:
+            current_accumulation = self.max_accumulation
+        trainer.accumulate_grad_batches = current_accumulation
+
+    def on_train_batch_end(self, trainer: "Trainer", pl_module: "LightningModule", *args, **kwargs) -> None:
+        self.current_batch += 1
+


The new GradientAccumulation callback class lacks test coverage. Consider adding tests to verify the gradient accumulation logic, warmup behavior, and integration with the trainer's accumulate_grad_batches setting.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2025-12-30T07:38:35Z

@henrytsui000 I've opened a new pull request, #226, to work on those changes. Once the pull request is ready, I'll request review from you.

henrytsui000 added 6 commits November 24, 2025 16:49

[Fix] typo of quiet (from quite)

f6db58d

🐛 [Add] Gradient acc for stable single gpu

8741d7f

🐛 [Fix] for setting multigpu, enable pick cuda id

3e7dd7d

🔀 [Merge] branch 'fix/typo-in-quiet' into develop

267e99f

💚 [Fix] bug for pytest for inference

99edaaf

📝 [Update] Docs, remove broken badge

e0d47ad

Copilot AI review requested due to automatic review settings December 30, 2025 07:32

Copilot started reviewing on behalf of henrytsui000 December 30, 2025 07:33 View session

Copilot AI reviewed Dec 30, 2025

View reviewed changes

Copilot AI mentioned this pull request Dec 30, 2025

Add test coverage for GradientAccumulation callback #226

Draft

11 tasks

henrytsui000 merged commit c4cb5f6 into main Dec 30, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release/v1.0 #225

Release/v1.0 #225

Uh oh!

henrytsui000 commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

henrytsui000 Dec 30, 2025

Uh oh!

Copilot AI commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Release/v1.0 #225

Release/v1.0 #225

Uh oh!

Conversation

henrytsui000 commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

henrytsui000 Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants