A C++ utility that combines multiple source files into a single text file while respecting .gitignore rules and handling binary file detection.
- Recursively processes directories and combines text files
- Respects
.gitignorepatterns including:- Leading
/for anchored paths - Trailing
/for directory-only matches *and**wildcards- Negation with
!
- Leading
- Automatically detects and skips binary files
- Preserves file paths in the combined output
- Handles nested
.gitignorefiles
- CMake 3.10 or higher
- C++20 compatible compiler
- Visual Studio Build Tools (for Windows)
Run the provided PowerShell build script:
.\build.ps1The script will:
- Initialize VS Developer Shell
- Configure CMake for x64
- Build the project
- Copy the executable to the project root
cmake -B build
cmake --build build --config ReleaseProjectCompressor <directory_path>The program will:
- Scan the specified directory and its subdirectories
- Process all text files while respecting
.gitignorerules - Create a
combined.txtfile containing all the processed files
# File: /path/to/source/file1.cpp
[contents of file1.cpp]
# File: /path/to/source/file2.hpp
[contents of file2.hpp]
The program implements sophisticated .gitignore rule parsing and matching:
// A structure representing a single .gitignore rule.
struct GitIgnoreRule {
std::regex patternRegex; // The pattern converted to a regex.
bool negate = false; // True if the rule starts with '!'
bool directoryOnly = false; // True if the pattern ends with '/'
bool anchored = false; // True if the pattern starts with '/'
std::string originalPattern; // The original pattern text.
};Uses a heuristic approach to detect binary files by sampling content:
bool isBinaryFile(const fs::path &filePath) {
std::ifstream in(filePath, std::ios::binary);
if (!in)
return false;
const size_t sampleSize = 512;
char buffer[sampleSize];
in.read(buffer, sampleSize);
std::streamsize bytesRead = in.gcount();
if (bytesRead == 0)
return false;
int nonPrintable = 0;
for (int i = 0; i < bytesRead; ++i) {
unsigned char c = static_cast<unsigned char>(buffer[i]);
if (!((c >= 32 && c <= 126) || c == 9 || c == 10 || c == 13))
nonPrintable++;
}
return (static_cast<double>(nonPrintable) / bytesRead) > 0.30;
}Recursively processes directories while respecting ignore rules:
void processDirectory(const fs::path &dir,
std::ofstream &out,
const std::vector<GitIgnoreRule> &rules,
const fs::path &baseDir)
{
for (const auto &entry : fs::directory_iterator(dir)) {
if (!entry.exists())
continue;
fs::path path = entry.path();
if (path.filename() == ".gitignore" || path.filename() == "combined.txt")
continue;
if (isIgnored(rules, baseDir, path))
continue;
if (fs::is_directory(path)) {
processDirectory(path, out, rules, baseDir);
} else {
if (isBinaryFile(path)) {
std::cerr << "Skipping binary file: " << path << "\n";
continue;
}
out << "# File: " << path.string() << "\n\n";
std::ifstream inFile(path);
if (inFile)
out << inFile.rdbuf() << "\n\n";
else
std::cerr << "Failed to open file: " << path << "\n";
}
}
}- Fork the repository
- Suggest/Make some improvements
- Create a new Pull Request
- Does not handle all edge cases of
.gitignorepattern matching - Binary file detection is heuristic-based and may have false positives/negatives
- Large directories with many files may consume significant memory