Conversation
README.md
Outdated
|
|
||
| ```bash | ||
| python train.py configs/111m.yaml | ||
| python train.py CSX --mode train --params configs/111b.yaml |
There was a problem hiding this comment.
Is the CSX argument needed? We should have a way to specify the device if needed outside of CSX but CSX should be the default device
There was a problem hiding this comment.
Made CSX the default device and updated the README.md
| - /path/to/data/location | ||
| python_paths: | ||
| - /path/to/code/location | ||
| trainer: |
There was a problem hiding this comment.
Thanks for this, I understand we were able to run on CPU with these changes, but we will need to also verify if we are able to run on CSX. @abhis-cerebras , can you please help here? These are changes made by @srinjoym-cerebras to port our giga gpt model to the new trainer flow. I think he was able to run it on CPU but we will need to verify these changes on CSX.
| causal_attention_mask *= torch.finfo(causal_attention_mask.dtype).min | ||
| causal_attention_mask = create_broadcasted_autoregressive_mask( | ||
| batch_size=batch_size, | ||
| num_heads=1, |
There was a problem hiding this comment.
| num_heads=1, | |
| num_heads=self.config.heads, |
There was a problem hiding this comment.
Creating mask explicitly in #5 would be preferable.
| save_checkpoint(step) | ||
|
|
||
| logger.info("Training completed successfully!") | ||
| from cerebras.modelzoo.common.utils.run.cli_pytorch import get_params_from_args |
There was a problem hiding this comment.
I don't disagree with this change for getting it to work but I don't think we can continue to claim ~600 lines when we make library calls which hides all the code.
There was a problem hiding this comment.
Sorry, could you clarify on what you mean by hiding the code?
There was a problem hiding this comment.
Sorry for the late reply. The motivation for maintaining gigaGPT is to demonstrate easy model size scaling on Cerebras Hardware using only simple and readable pytorch code.
If we delegate all the train.py code to a helper function, we can no longer claim that. One could argue that we scale only by hiding the complexity in a helper function as it is immediately not visible what happens behind cerebras.modelzoo.common.run_utils.main.
| def main(): | ||
| params = get_params_from_args() | ||
|
|
||
| from cerebras.modelzoo.common.run_utils import main |
There was a problem hiding this comment.
do we need to add Cerebras modelzoo dependency in requirements.txt? Please note that this will be a new dependency. cc @gokulr-cerebras
In rel-2.3.0, a new trainer flow and YAML structure was introduced. The corresponding changes were made to the repo.