Skip to content

Conversation

@GraemeF
Copy link
Contributor

@GraemeF GraemeF commented Dec 23, 2025

Ticket: PAE-000

Summary

Adds diagnostic benchmark tooling for measuring Excel parser performance:

  • Parse benchmark (npm run benchmark:parse <file>) - Measures Excel file loading and parsing time, reports metadata fields, table counts, and row counts per table
  • Validation benchmark (npm run benchmark:validate <file>) - Measures both parse and validation time through the full data syntax validation pipeline, reports validation outcomes (INCLUDED/EXCLUDED/REJECTED) per table and any validation issues
  • Test file generator (npm run benchmark:generate <source> <rows> [output]) - Generates test files with a target row count by duplicating existing data into empty placeholder rows, preserving all Excel formatting, styles, and validation

Findings

Parse time is essentially O(1) for row count - dominated by Excel file loading/decompression rather than row processing:

Rows Parse Validate Total
105 2.39s 0.02s 2.40s
1,000 2.42s 0.09s 2.50s
5,000 2.64s 0.39s 3.02s
10,000 2.79s 0.73s 3.52s

Validation scales O(n) at ~0.07ms per row, but even at 10k rows it's only 20% of total time.

Files Changed

  • benchmarks/parse-file.js - Updated usage message (script renamed)
  • benchmarks/validate-file.js - New validation benchmark script
  • benchmarks/generate-test-file.js - New test file generator
  • benchmarks/README.md - Documentation for all three tools
  • package.json - Added npm scripts

- Add benchmark:generate script to create test files with configurable row counts
- Enhance benchmark:file to show per-table row breakdown
- Generator fills existing placeholder rows (fast) rather than inserting (slow)
- Parser performance is O(1) for row count - file loading dominates
- Add validate-file.js: benchmarks parse + validation pipeline
- Rename benchmark:file to benchmark:parse for clarity
- Add benchmark:validate script for full pipeline benchmarking
- Update README with all three benchmark commands
- Show parsed vs validated row counts separately
- Display warning when fatal errors block validation
- Show issue details with location (table, row, column)
- Make it clear when validation is blocked vs rows rejected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants