-
Notifications
You must be signed in to change notification settings - Fork 235
Release/v1.0 #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release/v1.0 #225
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR prepares the codebase for v1.0 release by stabilizing training for v9-t models and fixing several bugs. The changes focus on implementing gradient accumulation with warmup, fixing a widespread typo, and enhancing distributed training support.
Key changes include:
- Implementation of a new
GradientAccumulationcallback to dynamically adjust gradient accumulation during training warmup - Updates to the EMA callback to properly account for gradient accumulation steps
- Correction of the "quite" → "quiet" typo throughout configuration and documentation
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| yolo/utils/model_utils.py | Added GradientAccumulation callback class for dynamic gradient accumulation during warmup; updated EMA callback to work with gradient accumulation by tracking batch steps |
| yolo/utils/logging_utils.py | Integrated GradientAccumulation into setup; corrected "quite" typo; changed batch progress description; removed clear_live() call |
| yolo/lazy.py | Added configurable accelerator and device settings; enabled sync_batchnorm for distributed training; corrected "quite" typo |
| yolo/config/task/train.yaml | Added equivalent_batch_size parameter to support gradient accumulation configuration |
| yolo/config/general.yaml | Changed device configuration from fixed 0 to auto and added accelerator setting |
| yolo/config/config.py | Added equivalent_batch_size field to DataConfig and accelerator field to Config |
| tests/test_utils/test_bounding_box_utils.py | Removed .to(device) call from model creation in test (model doesn't need to be on device for this test) |
| tests/conftest.py | Removed equivalent_batch_size from test configuration to prevent GradientAccumulation callback activation during tests |
| docs/HOWTO.md | Corrected "quite" → "quiet" in example command |
| README.md | Corrected "quite" → "quiet" in documentation; removed WIP and PWC badges |
| .github/workflows/deploy.yaml | Corrected "quite" → "quiet" in workflow command |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class GradientAccumulation(Callback): | ||
| def __init__(self, data_cfg: DataConfig, scheduler_cfg: SchedulerConfig): | ||
| super().__init__() | ||
| self.equivalent_batch_size = data_cfg.equivalent_batch_size | ||
| self.actual_batch_size = data_cfg.batch_size | ||
| self.warmup_epochs = getattr(scheduler_cfg.warmup, "epochs", 0) | ||
| self.current_batch = 0 | ||
| self.max_accumulation = 1 | ||
| self.warmup_batches = 0 | ||
| logger.info(":arrows_counterclockwise: Enable Gradient Accumulation") |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GradientAccumulation class lacks a docstring explaining its purpose, parameters, and behavior. Consider adding documentation to describe how this callback manages gradient accumulation during training warmup.
| class GradientAccumulation(Callback): | ||
| def __init__(self, data_cfg: DataConfig, scheduler_cfg: SchedulerConfig): | ||
| super().__init__() | ||
| self.equivalent_batch_size = data_cfg.equivalent_batch_size | ||
| self.actual_batch_size = data_cfg.batch_size | ||
| self.warmup_epochs = getattr(scheduler_cfg.warmup, "epochs", 0) | ||
| self.current_batch = 0 | ||
| self.max_accumulation = 1 | ||
| self.warmup_batches = 0 | ||
| logger.info(":arrows_counterclockwise: Enable Gradient Accumulation") | ||
|
|
||
| def setup(self, trainer: "Trainer", pl_module: "LightningModule", stage: str) -> None: | ||
| effective_batch_size = self.actual_batch_size * trainer.world_size | ||
| self.max_accumulation = max(1, round(self.equivalent_batch_size / effective_batch_size)) | ||
| batches_per_epoch = int(len(pl_module.train_loader) / trainer.world_size) | ||
| self.warmup_batches = int(self.warmup_epochs * batches_per_epoch) | ||
|
|
||
| def on_train_epoch_start(self, trainer: "Trainer", pl_module: "LightningModule") -> None: | ||
| self.current_batch = trainer.global_step | ||
|
|
||
| def on_train_batch_start(self, trainer: "Trainer", pl_module: "LightningModule", *args, **kwargs) -> None: | ||
| if self.current_batch < self.warmup_batches: | ||
| current_accumulation = round(lerp(1, self.max_accumulation, self.current_batch, self.warmup_batches)) | ||
| else: | ||
| current_accumulation = self.max_accumulation | ||
| trainer.accumulate_grad_batches = current_accumulation | ||
|
|
||
| def on_train_batch_end(self, trainer: "Trainer", pl_module: "LightningModule", *args, **kwargs) -> None: | ||
| self.current_batch += 1 | ||
|
|
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new GradientAccumulation callback class lacks test coverage. Consider adding tests to verify the gradient accumulation logic, warmup behavior, and integration with the trainer's accumulate_grad_batches setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
|
@henrytsui000 I've opened a new pull request, #226, to work on those changes. Once the pull request is ready, I'll request review from you. |
Stabilize training for v9-t, and fix some previous bugs.