Master node not working when using distributed queue

I'm running a 4 GPU cluster, utilizing this node package for load balancing and ease of use, so I don't need to manage 4 instances myself.
When using the Distributed Queue node, the master node never assigns a job to itself, even though it's NOT set to orchestrator-only.
The workaround I've found for the time being is running a 5th instance for cuda0, and setting the master to orchestrator-only, but since both of them are on cuda0, if the master fails to assign the job to a worker and falls back to local execution, I might get into a situation where both instances are trying to work on cuda0 and OOM out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Master node not working when using distributed queue #72

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Master node not working when using distributed queue #72

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions