Skip to content

Comments

Ported the repo to rel-2.3.0#2

Open
srinjoym-cerebras wants to merge 4 commits intoCerebras:mainfrom
srinjoym-cerebras:port_rel-2.3.0
Open

Ported the repo to rel-2.3.0#2
srinjoym-cerebras wants to merge 4 commits intoCerebras:mainfrom
srinjoym-cerebras:port_rel-2.3.0

Conversation

@srinjoym-cerebras
Copy link

In rel-2.3.0, a new trainer flow and YAML structure was introduced. The corresponding changes were made to the repo.

README.md Outdated

```bash
python train.py configs/111m.yaml
python train.py CSX --mode train --params configs/111b.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the CSX argument needed? We should have a way to specify the device if needed outside of CSX but CSX should be the default device

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made CSX the default device and updated the README.md

- /path/to/data/location
python_paths:
- /path/to/code/location
trainer:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I understand we were able to run on CPU with these changes, but we will need to also verify if we are able to run on CSX. @abhis-cerebras , can you please help here? These are changes made by @srinjoym-cerebras to port our giga gpt model to the new trainer flow. I think he was able to run it on CPU but we will need to verify these changes on CSX.

causal_attention_mask *= torch.finfo(causal_attention_mask.dtype).min
causal_attention_mask = create_broadcasted_autoregressive_mask(
batch_size=batch_size,
num_heads=1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_heads=1,
num_heads=self.config.heads,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating mask explicitly in #5 would be preferable.

save_checkpoint(step)

logger.info("Training completed successfully!")
from cerebras.modelzoo.common.utils.run.cli_pytorch import get_params_from_args

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with this change for getting it to work but I don't think we can continue to claim ~600 lines when we make library calls which hides all the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you clarify on what you mean by hiding the code?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. The motivation for maintaining gigaGPT is to demonstrate easy model size scaling on Cerebras Hardware using only simple and readable pytorch code.

If we delegate all the train.py code to a helper function, we can no longer claim that. One could argue that we scale only by hiding the complexity in a helper function as it is immediately not visible what happens behind cerebras.modelzoo.common.run_utils.main.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it.

def main():
params = get_params_from_args()

from cerebras.modelzoo.common.run_utils import main
Copy link

@mohitk-cerebras mohitk-cerebras Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add Cerebras modelzoo dependency in requirements.txt? Please note that this will be a new dependency. cc @gokulr-cerebras

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants