Skip to content

scrept for ray cluster trining #150

@AliMamaghani1999

Description

@AliMamaghani1999

I created a ray cluster with Helm and tried running the scripts below for multi-node training, but it got stuck at the first step of training (see the image below). Could you please help me with this?

Image

train_1.5b_grpo_tool server.sh
train_1.5b_grpo.sh

ray-values-verl.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions