A Python utility for converting codebases into organized text files, perfect for creating training datasets or documentation from local codebases.
Code2Txt is a simple yet powerful tool that processes local codebases and converts their contents into well-structured text files. It's particularly useful for:
- Creating training datasets for AI/ML models
- Generating documentation from codebases
- Analyzing code structure and patterns
- Converting codebases into readable text format
- Processes multiple directories simultaneously
- Configurable file and directory exclusions
- Handles empty files (optional)
- Creates individual text files for each source file
- Merges all text files into a single consolidated file
- Preserves file structure and hierarchy
- UTF-8 encoding support
- Cross-platform compatibility
- Python 3.x
- pip (Python package installer)
- Clone this repository:
git clone https://github.com/NjbSyd/code2txt.git
cd code2txt- Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtThe tool is configured through environment variables in the .env file. Here's what each variable does:
IGNORE_FILES: Comma-separated list of files to ignoreIGNORE_DIRS: Comma-separated list of directories to ignoreIGNORE_PATTERNS: Comma-separated list of file patterns to ignore (supports wildcards)SOURCE_DIRECTORY: Comma-separated list of directories to processSAVE_DIRECTORY: Comma-separated list of directories where output will be savedSKIP_EMPTY_FILES: Set to 'TRUE' to skip empty files
Example .env configuration:
IGNORE_FILES=.gitignore,package-lock.json,.DS_Store
IGNORE_DIRS=node_modules,.git,.next
IGNORE_PATTERNS=*.png,*.lock,*.log
SOURCE_DIRECTORY=/path/to/project1,/path/to/project2
SAVE_DIRECTORY=output1,output2
SKIP_EMPTY_FILES=TRUE
- Configure your
.envfile with the appropriate settings - Run the script:
python main.pyThe script will:
- Process each specified directory
- Create individual text files for each source file
- Merge all text files into a single consolidated file
- Save the output in the specified save directories
The tool creates two types of output:
- Individual text files for each source file (named with the original path, using '___' as separator)
- A merged file containing all content, with clear separators between files
Example merged file format:
--- Start: path/to/file1.js ---
[file content]
--- End: path/to/file1.js ---
--- Start: path/to/file2.js ---
[file content]
--- End: path/to/file2.js ---
-
File Not Found Error
- Ensure directories exist and paths are correct
- Check for typos in directory paths
-
Permission Errors
- Ensure you have read access to source directories
- Ensure you have write access to save directories
-
Encoding Errors
- The tool uses UTF-8 encoding
- If you encounter encoding issues, ensure your source files are UTF-8 compatible
- Always use absolute paths in the
.envfile - Keep your
.envfile secure and don't commit it to version control - Use the virtual environment to avoid dependency conflicts
- Regularly update the ignore patterns to match your project's needs
Contributions are welcome! Please feel free to submit a Pull Request.
This project is released under the MIT License.
For support, please open an issue in the GitHub repository.