-
-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Description
I'm running a 4 GPU cluster, utilizing this node package for load balancing and ease of use, so I don't need to manage 4 instances myself.
When using the Distributed Queue node, the master node never assigns a job to itself, even though it's NOT set to orchestrator-only.
The workaround I've found for the time being is running a 5th instance for cuda0, and setting the master to orchestrator-only, but since both of them are on cuda0, if the master fails to assign the job to a worker and falls back to local execution, I might get into a situation where both instances are trying to work on cuda0 and OOM out.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels