- [ ] you can review readme before pushing e.g. [here](https://markdownlivepreview.com/) - [ ] dump intermediate data e.g. the 20M sampled sentences - [ ] fix passing arguments via the script to be able to better control the file naming