Skip to content

NjbSyd/code2txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code2Txt

A Python utility for converting codebases into organized text files, perfect for creating training datasets or documentation from local codebases.

Overview

Code2Txt is a simple yet powerful tool that processes local codebases and converts their contents into well-structured text files. It's particularly useful for:

  • Creating training datasets for AI/ML models
  • Generating documentation from codebases
  • Analyzing code structure and patterns
  • Converting codebases into readable text format

Features

  • Processes multiple directories simultaneously
  • Configurable file and directory exclusions
  • Handles empty files (optional)
  • Creates individual text files for each source file
  • Merges all text files into a single consolidated file
  • Preserves file structure and hierarchy
  • UTF-8 encoding support
  • Cross-platform compatibility

Prerequisites

  • Python 3.x
  • pip (Python package installer)

Installation

  1. Clone this repository:
git clone https://github.com/NjbSyd/code2txt.git
cd code2txt
  1. Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Configuration

The tool is configured through environment variables in the .env file. Here's what each variable does:

  • IGNORE_FILES: Comma-separated list of files to ignore
  • IGNORE_DIRS: Comma-separated list of directories to ignore
  • IGNORE_PATTERNS: Comma-separated list of file patterns to ignore (supports wildcards)
  • SOURCE_DIRECTORY: Comma-separated list of directories to process
  • SAVE_DIRECTORY: Comma-separated list of directories where output will be saved
  • SKIP_EMPTY_FILES: Set to 'TRUE' to skip empty files

Example .env configuration:

IGNORE_FILES=.gitignore,package-lock.json,.DS_Store
IGNORE_DIRS=node_modules,.git,.next
IGNORE_PATTERNS=*.png,*.lock,*.log
SOURCE_DIRECTORY=/path/to/project1,/path/to/project2
SAVE_DIRECTORY=output1,output2
SKIP_EMPTY_FILES=TRUE

Usage

  1. Configure your .env file with the appropriate settings
  2. Run the script:
python main.py

The script will:

  1. Process each specified directory
  2. Create individual text files for each source file
  3. Merge all text files into a single consolidated file
  4. Save the output in the specified save directories

Output Format

The tool creates two types of output:

  1. Individual text files for each source file (named with the original path, using '___' as separator)
  2. A merged file containing all content, with clear separators between files

Example merged file format:

--- Start: path/to/file1.js ---
[file content]
---   End: path/to/file1.js  ---

--- Start: path/to/file2.js ---
[file content]
---   End: path/to/file2.js  ---

Troubleshooting

Common Issues

  1. File Not Found Error

    • Ensure directories exist and paths are correct
    • Check for typos in directory paths
  2. Permission Errors

    • Ensure you have read access to source directories
    • Ensure you have write access to save directories
  3. Encoding Errors

    • The tool uses UTF-8 encoding
    • If you encounter encoding issues, ensure your source files are UTF-8 compatible

Best Practices

  1. Always use absolute paths in the .env file
  2. Keep your .env file secure and don't commit it to version control
  3. Use the virtual environment to avoid dependency conflicts
  4. Regularly update the ignore patterns to match your project's needs

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is released under the MIT License.

Support

For support, please open an issue in the GitHub repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages