This repository provides production‑ready code examples for implementing batch Word document comparison using GroupDocs.Comparison for Node.js via Java. Built with Node.js 20 LTS, these examples demonstrate optimized batch processing, parallel execution, and performance monitoring. Designed for developers who need to efficiently compare large sets of Word files.
- Platform: Node.js 20 LTS
- Product: GroupDocs.Comparison
- Language: JavaScript (Node.js)
- Framework: None (plain Node.js runtime)
Developers working with large collections of Word documents often need to detect differences across versions, regulatory revisions, or content updates. Performing a manual side‑by‑side review is error‑prone and does not scale. Traditional line‑by‑line diff tools cannot handle the rich structure of DOCX files, leading to missed formatting changes or broken layout detection. Moreover, processing thousands of document pairs sequentially can take hours, consuming valuable compute resources.
GroupDocs.Comparison simplifies this challenge by offering a high‑level API that understands Word document internals, automatically detects textual, structural, and formatting changes, and produces a visual comparison document. The library abstracts away low‑level XML handling, delivering reliable, repeatable results while exposing performance‑tuning options.
GroupDocs.Comparison addresses these challenges through a native‑accelerated comparison engine accessed via a thin Node.js wrapper. The library provides a clear, promise‑based API that can be orchestrated in sequential or parallel workflows. Key technical features include:
- High‑Fidelity Comparison: Detects text, style, images, tables, and footnotes.
- Batch Processing API: Helper functions to locate matching file pairs and drive bulk operations.
- Parallel Execution: Configurable concurrency to fully utilize multi‑core CPUs.
- Progress Callbacks: Real‑time feedback for long‑running jobs.
- Performance Options: Adjustable sensitivity, optional summary pages, and memory‑friendly modes.
Before running these examples, ensure you have:
- Node.js – 20 LTS or later (
node --version) - Java Runtime – JRE/JDK 8+ (recommended 17 LTS) (
java -version) - JAVA_HOME – Environment variable pointing to your JDK installation
- GroupDocs.Comparison npm package – Installed via
npm install - Temporary License – Obtain from the temporary‑license badge link if you do not have a permanent key
npm install- Set
JAVA_HOMEto point at your JDK directory. - If you have a permanent license, replace the placeholder in
src/utils/licenseHelper.jswith your license string. - Optionally adjust
compareOptionsin the Optimized Batch Comparison example to control sensitivity or enable summary pages.
groupdocs-comparison-batch-word-nodejs/
│
├── package.json
├── README.md
├── src/
│ ├── batchComparison.js # Core batch comparison functions
│ ├── examples/
│ │ ├── basicBatchComparison.js # Sequential processing demo
│ │ ├── parallelBatchComparison.js # Parallel processing demo
│ │ ├── optimizedBatchComparison.js # Performance‑tuned demo
│ │ ├── batchWithProgress.js # Progress‑tracking demo
│ │ └── performanceBenchmark.js # Benchmarking demo
│ └── utils/
│ ├── fileHelper.js # File utilities
│ ├── licenseHelper.js # License handling
│ ├── performanceMonitor.js # Monitoring helpers
│ └── constants.js # Shared constants
├── sample-files/ # Input documents
│ ├── source/
│ └── target/
└── output/ # Generated comparison results
- package.json – Project metadata and npm dependencies.
- src/batchComparison.js – Implements
compareWordPair, batch sequential/parallel flows, pair discovery, and report generation. - src/examples/basicBatchComparison.js – Demonstrates a simple sequential batch run.
- src/examples/parallelBatchComparison.js – Shows how to run comparisons in parallel with configurable concurrency.
- src/examples/optimizedBatchComparison.js – Applies
CompareOptionsfor speed/accuracy trade‑offs. - src/examples/batchWithProgress.js – Provides real‑time progress feedback via a console progress bar.
- src/examples/performanceBenchmark.js – Benchmarks sequential vs parallel strategies and prints the best‑performing configuration.
- src/utils/ – Helper modules for file I/O, license loading, performance timing, and constant definitions.
Implementation: Compares a single pair of Word documents and generates a comparison result document.
This function validates the existence of the source and target files, ensures the output directory exists, creates a Comparer instance, adds the target file, runs the comparison (optionally with custom options), and returns a metadata object containing the operation duration, file size, and any error information.
const startTime = Date.now();
try {
// Validate files exist
if (!fs.existsSync(sourcePath)) {
throw new Error(`Source file not found: ${sourcePath}`);
}
if (!fs.existsSync(targetPath)) {
throw new Error(`Target file not found: ${targetPath}`);
}
// Ensure output directory exists
const outputDir = path.dirname(outputPath);
if (!fs.existsSync(outputDir)) {
fs.mkdirSync(outputDir, { recursive: true });
}
// Initialize comparer
const comparer = new groupdocs.Comparer(sourcePath);
comparer.add(targetPath);
// Perform comparison
const compareOptions = options.compareOptions || null;
if (compareOptions) {
await comparer.compare(outputPath, compareOptions);
} else {
await comparer.compare(outputPath);
}
const duration = Date.now() - startTime;
const fileSize = fs.existsSync(outputPath) ? fs.statSync(outputPath).size : 0;
return {
success: true,
sourcePath,
targetPath,
outputPath,
duration,
fileSize,
error: null
};
} catch (error) {
const duration = Date.now() - startTime;
return {
success: false,
sourcePath,
targetPath,
outputPath,
duration,
fileSize: 0,
error: error.message
};
}- Comparer – Wrapper around the native GroupDocs.Comparison engine; instantiated with the source document path.
- add() – Registers the target document for comparison.
- compare() – Executes the comparison and writes a result DOCX; can accept a
CompareOptionsobject to tune sensitivity, generate summary pages, etc. - Error handling – Catches any I/O or API errors, returning a consistent result shape.
- Performance – Measures elapsed time with
Date.now()and reports the output file size.
Key Components:
compareWordPair: Core function handling a single comparison.fs&path: Node.js built‑ins for file system interactions.groupdocs.Comparer: Main API class from the npm package.
Parameters:
sourcePath– Full path to the original document.targetPath– Full path to the document to compare against.outputPath– Destination for the generated comparison file.options– Optional object containingcompareOptionsfor fine‑tuning.
Output: Returns a JSON‑compatible object summarising success, timing, and size.
The function iterates over an array of document pair descriptors, invoking compareWordPair for each pair. It aggregates results, tracks processing statistics, optionally reports progress via a callback, and finally returns a summary object containing totals, average duration, and the individual results.
const startTime = Date.now();
const results = [];
let processed = 0;
let succeeded = 0;
let failed = 0;
for (const pair of documentPairs) {
const result = await compareWordPair(
pair.source,
pair.target,
pair.output,
options
);
results.push(result);
processed++;
if (result.success) {
succeeded++;
} else {
failed++;
console.error(`✗ [${processed}/${documentPairs.length}] ${path.basename(pair.source)} - ${result.error}`);
}
if (progressCallback) {
progressCallback({
processed,
total: documentPairs.length,
succeeded,
failed,
percentage: Math.round((processed / documentPairs.length) * 100)
});
}
}
const totalDuration = Date.now() - startTime;
const avgDuration = results.reduce((sum, r) => sum + r.duration, 0) / results.length;
return {
total: documentPairs.length,
succeeded,
failed,
totalDuration,
avgDuration,
results
};- Sequential loop – Guarantees only one comparison runs at a time, keeping memory usage minimal.
- Progress callback – Allows UI or console progress bars; emits every change in processed count.
- Result aggregation – Collects each pair's metadata for later reporting or storing as JSON.
- Error resilience – Failures are logged but do not abort the whole batch.
Key Components:
compareWordPair: Re‑used for each iteration.progressCallback: Optional user‑supplied function for real‑time feedback.
Parameters:
documentPairs– Array of{source, target, output}objects.options– Optional comparison options passed through to each pair.progressCallback– Optional function invoked with processing stats.
Output: Summary object with totals, timing, and the list of per‑pair results.
The function processes the input array in configurable batches, launching a Promise.all for each batch to run several comparisons concurrently. It respects a concurrency limit to avoid overwhelming the system, reports progress, and returns a detailed summary similar to the sequential version.
const startTime = Date.now();
const results = [];
let processed = 0;
let succeeded = 0;
let failed = 0;
// Process in batches to control concurrency
for (let i = 0; i < documentPairs.length; i += concurrency) {
const batch = documentPairs.slice(i, i + concurrency);
const batchResults = await Promise.all(
batch.map(pair => compareWordPair(pair.source, pair.target, pair.output, options))
);
for (const result of batchResults) {
results.push(result);
processed++;
if (result.success) {
succeeded++;
} else {
failed++;
console.error(`✗ [${processed}/${documentPairs.length}] ${path.basename(result.sourcePath)} - ${result.error}`);
}
if (progressCallback) {
progressCallback({
processed,
total: documentPairs.length,
succeeded,
failed,
percentage: Math.round((processed / documentPairs.length) * 100)
});
}
}
if (i + concurrency < documentPairs.length) {
await new Promise(resolve => setTimeout(resolve, 100));
}
}
const totalDuration = Date.now() - startTime;
const avgDuration = results.reduce((sum, r) => sum + r.duration, 0) / results.length;
return {
total: documentPairs.length,
succeeded,
failed,
totalDuration,
avgDuration,
concurrency,
results
};- Batch concurrency control – The
concurrencyparameter caps the number of simultaneouscompareWordPairpromises, preventing excessive memory pressure. - Small inter‑batch delay – Adds a 100 ms pause between batches to give the OS time to flush I/O buffers.
- Aggregated metrics – Includes
concurrencyin the final summary for traceability. - Error handling – Mirrors the sequential version; individual failures are logged without aborting other tasks.
Key Components:
Promise.all: Executes a batch of comparisons in parallel.concurrency: User‑defined limit controlling parallelism.
Parameters: Same as the sequential version, plus concurrency.
Output: Provides total duration, average per‑document time, and the concurrency level used.
The function scans the supplied source and target directories, filters for .docx and .doc files, and builds an array of pair objects where a file with the same base name exists in both locations. It also constructs output paths for the comparison results.
if (!fs.existsSync(sourceDir)) {
throw new Error(`Source directory not found: ${sourceDir}`);
}
if (!fs.existsSync(targetDir)) {
throw new Error(`Target directory not found: ${targetDir}`);
}
const sourceFiles = fs.readdirSync(sourceDir)
.filter(f => f.toLowerCase().endsWith('.docx') || f.toLowerCase().endsWith('.doc'))
.map(f => {
const baseName = path.basename(f, path.extname(f));
return {
name: f,
source: path.join(sourceDir, f),
target: path.join(targetDir, f),
output: path.join(outputDir, `comparison_${baseName}.docx`)
};
})
.filter(f => fs.existsSync(f.target)); // Only include pairs where target exists
return sourceFiles;- Directory validation – Throws early if either input directory is missing.
- Extension filter – Accepts both
.docxand legacy.docformats. - Base‑name matching – Relies on identical filenames (excluding extension) to pair documents.
- Automatic output naming – Prefixes
comparison_and ensures a.docxresult.
Key Components:
fs.readdirSync: Reads directory entries.path.basename/path.extname: Manipulate filenames.
Parameters:
sourceDir,targetDir,outputDir– Paths to the respective folders.
Output: Array of objects containing source, target, and output paths for each matched pair.
The function receives the aggregated batch result object and produces a multi‑line formatted string summarising total documents, successes, failures, success rate, total run time, average per‑document duration, throughput, and the concurrency strategy used.
const { total, succeeded, failed, totalDuration, avgDuration, concurrency } = batchResults;
const report = `
================================================================================
Batch Comparison Summary
================================================================================
Total Documents: ${total}
Successful: ${succeeded}
Failed: ${failed}
Success Rate: ${((succeeded / total) * 100).toFixed(2)}%
Performance Metrics:
Total Duration: ${(totalDuration / 1000).toFixed(2)}s
Average Duration: ${avgDuration.toFixed(2)}ms per document
Throughput: ${(succeeded / (totalDuration / 1000)).toFixed(2)} documents/second
${concurrency ? `Concurrency: ${concurrency}` : 'Processing: Sequential'}
================================================================================
`;
return report;- Template literals – Build a human‑readable report with aligned columns.
- Dynamic concurrency label – Shows either the concurrency value or marks the run as sequential.
- Metrics calculations – Derive success percentage, average duration, and throughput.
Key Components:
batchResults: Object produced by the sequential or parallel batch functions.
Parameters:
batchResults– Summary of the batch run.
Output: Multi‑line string suitable for console output or log files.
When implementing batch Word document comparison, consider these best practices:
- Validate Input Paths – Always check that source, target, and output directories exist before processing.
- Control Concurrency – Start with a modest concurrency (e.g., 3‑5) and adjust based on CPU, memory, and I/O characteristics.
- Use Progress Callbacks – Provide users with real‑time feedback to improve perceived performance.
- Enable Summary Pages Sparingly – Generating visual summary pages adds overhead; enable only when needed.
- Monitor Memory Usage – Large DOCX files can consume significant RAM; process in batches and release references after each comparison.
For more in‑depth information about batch Word document comparison, explore these technical resources:
-
[Document Comparison Using GroupDocs.Comparison] – A step‑by‑step guide covering API basics, options, and advanced scenarios: Read the article →
-
[Optimizing Performance for Large‑Scale Comparisons] – Techniques for concurrency, memory management, and tuning CompareOptions: Read the article →
-
[GroupDocs.Comparison API Reference] – Full reference of classes, methods, and enumeration values: Read the article →
GroupDocs.Comparison, Node.js, Java, batch comparison, Word, docx, document diff, parallel processing, performance optimization, compareWordPair, compareBatchSequential, compareBatchParallel, findWordPairs, generateSummaryReport, progress tracking, temporary license, document automation, API, JavaScript, npm, high‑volume, scalable, concurrency
For technical support, visit: