Ported the repo to rel-2.3.0 by srinjoym-cerebras · Pull Request #2 · Cerebras/gigaGPT

srinjoym-cerebras · 2024-08-02T14:28:02Z

In rel-2.3.0, a new trainer flow and YAML structure was introduced. The corresponding changes were made to the repo.

gokulr-cerebras · 2024-08-05T06:30:53Z

README.md


 ```bash
-python train.py configs/111m.yaml
+python train.py CSX --mode train --params configs/111b.yaml


Is the CSX argument needed? We should have a way to specify the device if needed outside of CSX but CSX should be the default device

Made CSX the default device and updated the README.md

gokulr-cerebras · 2024-08-05T06:32:32Z

configs/111m.yaml

-  - /path/to/data/location
-python_paths:
-  - /path/to/code/location
+trainer:


Thanks for this, I understand we were able to run on CPU with these changes, but we will need to also verify if we are able to run on CSX. @abhis-cerebras , can you please help here? These are changes made by @srinjoym-cerebras to port our giga gpt model to the new trainer flow. I think he was able to run it on CPU but we will need to verify these changes on CSX.

deepak-cb · 2024-09-03T17:22:47Z

model.py

-        causal_attention_mask *= torch.finfo(causal_attention_mask.dtype).min
+        causal_attention_mask = create_broadcasted_autoregressive_mask(
+            batch_size=batch_size,
+            num_heads=1,


Suggested change

num_heads=1,

num_heads=self.config.heads,

Creating mask explicitly in #5 would be preferable.

deepak-cb · 2024-09-03T17:25:05Z

train.py

-        save_checkpoint(step)
-
-    logger.info("Training completed successfully!")
+from cerebras.modelzoo.common.utils.run.cli_pytorch import get_params_from_args


I don't disagree with this change for getting it to work but I don't think we can continue to claim ~600 lines when we make library calls which hides all the code.

Sorry, could you clarify on what you mean by hiding the code?

Sorry for the late reply. The motivation for maintaining gigaGPT is to demonstrate easy model size scaling on Cerebras Hardware using only simple and readable pytorch code.

If we delegate all the train.py code to a helper function, we can no longer claim that. One could argue that we scale only by hiding the complexity in a helper function as it is immediately not visible what happens behind cerebras.modelzoo.common.run_utils.main.

ok, got it.

mohitk-cerebras · 2024-09-03T17:34:05Z

train.py

+def main():
+    params = get_params_from_args()
+
+    from cerebras.modelzoo.common.run_utils import main


do we need to add Cerebras modelzoo dependency in requirements.txt? Please note that this will be a new dependency. cc @gokulr-cerebras

Srinjoy Mukherjee added 3 commits August 1, 2024 19:27

Added changes for porting to rel-2.3.0

0815911

Cleant the YAML files

ed7c0c6

Modified ReadMe.md

00496b8

gokulr-cerebras reviewed Aug 5, 2024

View reviewed changes

Made CSX the default device

ca803d3

deepak-cb reviewed Sep 3, 2024

View reviewed changes

mohitk-cerebras reviewed Sep 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Ported the repo to rel-2.3.0#2

Ported the repo to rel-2.3.0#2
srinjoym-cerebras wants to merge 4 commits intoCerebras:mainfrom
srinjoym-cerebras:port_rel-2.3.0

srinjoym-cerebras commented Aug 2, 2024

Uh oh!

gokulr-cerebras Aug 5, 2024

Uh oh!

srinjoym-cerebras Aug 5, 2024

Uh oh!

gokulr-cerebras Aug 5, 2024

Uh oh!

deepak-cb Sep 3, 2024

Uh oh!

deepak-cb Sep 4, 2024

Uh oh!

deepak-cb Sep 3, 2024

Uh oh!

srinjoym-cerebras Sep 5, 2024

Uh oh!

deepak-cb Sep 11, 2024

Uh oh!

srinjoym-cerebras Sep 12, 2024

Uh oh!

mohitk-cerebras Sep 3, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

srinjoym-cerebras commented Aug 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mohitk-cerebras Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mohitk-cerebras Sep 3, 2024 •

edited

Loading