Skip to content
This repository was archived by the owner on Jul 22, 2024. It is now read-only.

fix the deadlock problem when using distributed training in VQA fintune#197

Open
Light-V wants to merge 1 commit intomicrosoft:masterfrom
Light-V:master
Open

fix the deadlock problem when using distributed training in VQA fintune#197
Light-V wants to merge 1 commit intomicrosoft:masterfrom
Light-V:master

Conversation

@Light-V
Copy link

@Light-V Light-V commented May 19, 2022

When using distributed training, the process with local_rank!=0 will not call torch.distributed.barrier() and cause a deadlock.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant