qshuf is a fast and memory-efficient command-line tool for shuffling very large text files. It uses memory mapping to
minimize RAM usage, making it ideal for AI, machine learning, and data processing tasks that require randomized
datasets.
To build qshuf, compile it using cmake:
mkdir build && cd build
cmake ..
makeShuffle a large file and print to stdout:
qshuf data.txtShuffle and save output to a file:
qshuf data.txt -o shuffled.txtUse a specific random seed:
qshuf data.txt -s 42 > shuffled.txtThis project is licensed under the MIT License. See LICENSE for details.
Created by Davide Caroselli. Contributions welcome!