Add invoice PDF renaming utility script by kzuraw · Pull Request #37 · kzuraw/tools

kzuraw · 2026-01-27T17:26:10Z

Summary

Add a new utility script rename_invoices.py that automatically renames invoice PDFs to a standardized format with date prefixes based on file creation metadata.

Changes

New script: python/rename_invoices.py - Renames invoice PDFs from company_name invoice_number.pdf to yyyy-mm company_name invoice_number.pdf format
- Extracts file creation date (year, month) from file metadata
- Formats invoice numbers by removing whitespace and converting / to -
- Sanitizes filenames to remove invalid characters
- Supports --dry-run flag to preview changes before applying them
- Skips files that already have date prefixes or have unexpected formats
- Handles edge cases like duplicate target filenames
Updated README.md: Added documentation section for the new script with usage instructions

Implementation Details

Uses file st_birthtime (creation time) when available, falls back to st_mtime for cross-platform compatibility
Implements robust error handling for malformed filenames and existing target files
Provides clear user feedback with counts of renamed and skipped files
Follows the same CLI pattern as the existing rename_epubs.py script using Click framework

https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

Extracts dates from PDF content and renames invoice files to 'yyyy-mm company_name invoice_number.pdf' format with formatted invoice numbers (whitespace removed, / converted to -). https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

Remove pypdf dependency - use file metadata instead of parsing PDF content. https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

claude · 2026-01-27T17:28:10Z

Code Review for PR #37: Add invoice PDF renaming utility script

Thank you for this contribution! The script follows the established patterns well and implements a useful utility. Here are my findings:

✅ Strengths

Consistent with project conventions: Follows the same structure as rename_epubs.py with PEP 723 inline dependencies and Click framework
Good error handling: Handles edge cases like duplicate files, malformed filenames, and files already having date prefixes
Clear user feedback: Provides informative output during both dry-run and actual execution
Cross-platform compatibility: Gracefully falls back from st_birthtime to st_mtime for systems that don't support creation time
Documentation: README.md is properly updated with usage instructions

🐛 Issues & Bugs

Critical: Missing backslash in sanitize_filename (python/rename_invoices.py:35)

The invalid_chars string is missing the backslash character. It should be: '<>:"/\|?*' to match rename_epubs.py:63. This is important for cross-platform compatibility.

Bug: Missing continue statement (python/rename_invoices.py:66-67)

When a file already has a date prefix, the code echoes a message but doesn't continue to skip further processing. Add an explicit 'continue' after line 67 to prevent the code from trying to parse and rename these files.

Potential Issue: Company name parsing too simplistic (python/rename_invoices.py:71-76)

The script assumes company name is the first word using split(maxsplit=1). This works for single-word companies but might not handle multi-word company names as expected. Consider documenting the expected input format more explicitly.

🔍 Code Quality & Best Practices

Python version inconsistency: rename_invoices.py uses >=3.12 while rename_epubs.py uses >=3.14. Consider standardizing across all scripts.
Sanitization differences: The sanitize_filename function differs from rename_epubs.py - it doesn't normalize whitespace with re.sub(). Consider adding this for consistency.
Date prefix regex: The regex allows any 4-digit year and 2-digit month without validation (e.g., month 99 would pass).

🚀 Performance

No concerns - the script efficiently processes files with minimal I/O operations.

🔒 Security

No security concerns identified. The script properly sanitizes filenames and provides dry-run mode.

📝 Suggestions

Add examples in the docstring showing input/output transformations
Consider counting files that already have date prefixes separately in the summary

Summary

The code is well-structured and functional, but please address the missing backslash in invalid_chars and add the explicit continue statement after detecting files with existing date prefixes. These are important for correctness and preventing potential bugs.

Great work overall! 🎉

- Process files that already have date prefix (strip and reprocess) - Update to Python 3.14 - Add missing backslash to invalid_chars for cross-platform compatibility https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

cloudflare-workers-and-pages · 2026-01-27T17:31:02Z

Deploying tools with Cloudflare Pages

Latest commit:	`8d663f5`
Status:	✅ Deploy successful!
Preview URL:	https://d2342909.tools-4nd.pages.dev
Branch Preview URL:	https://claude-rename-invoice-pdfs-6.tools-4nd.pages.dev

View logs

claude · 2026-01-27T17:32:08Z

Pull Request Review

Overall, this is a well-structured utility script that follows the repository's conventions. The code is clean and readable with good error handling. I've identified a few areas for improvement:

✅ Strengths

Consistent with existing patterns - Follows the same structure as rename_epubs.py (Click CLI, dry-run support, PEP 723 dependencies)
Good error handling - Checks for edge cases like duplicate filenames and malformed input
Clear user feedback - Informative output messages and proper summary statistics
Documentation - Well-documented with docstrings and README updates

🐛 Potential Bugs

1. Date prefix detection issue (python/rename_invoices.py:66)

if re.match(r"^\d{4}-\d{2}\s", stem):
    stem = re.sub(r"^\d{4}-\d{2}\s+", "", stem)

Issue: The regex patterns differ - match() requires exactly one space after the date, but sub() removes one or more spaces. If a file is named 2025-01 Company INV123.pdf, the match will succeed, but the substitution won't remove the date prefix properly.

Fix: Use consistent regex pattern:

if re.match(r"^\d{4}-\d{2}\s+", stem):
    stem = re.sub(r"^\d{4}-\d{2}\s+", "", stem)

2. Already-correctly-named files counted incorrectly (python/rename_invoices.py:96)

When a file is already correctly named (new_path == pdf_path), the script skips it but doesn't provide the same feedback as rename_epubs.py does. This could be confusing for users.

Current behavior: Silent skip (no counter increment)

Suggested improvement: Add explicit handling like in rename_epubs.py:95-97:

if new_path == pdf_path:
    click.echo(f"Already named correctly: {pdf_path.name}")
    continue

⚠️ Edge Cases to Consider

3. Company names with multiple words (python/rename_invoices.py:70)

parts = stem.split(maxsplit=1)

The current logic assumes the company name is a single word. If the input file is Acme Corp INV-123.pdf, it will parse as:

Company: Acme
Invoice: Corp INV-123

Impact: This might be the intended behavior if users are expected to use underscore-separated company names (e.g., Acme_Corp). However, the PR description shows company_name invoice_number.pdf which suggests underscores are already expected.

Recommendation: Document this requirement more explicitly in the docstring, or consider a more robust parsing strategy (e.g., looking for invoice number patterns).

4. Cross-platform date handling (python/rename_invoices.py:21)

timestamp = getattr(stat, "st_birthtime", None) or stat.st_mtime

Consideration: On Linux systems, st_birthtime is typically not available, so this will fall back to st_mtime (modification time). This could give unexpected results if PDFs are edited after creation. Consider documenting this platform-specific behavior.

🔒 Security

No security concerns identified. The script:

✅ Uses Path for safe filesystem operations
✅ Validates input directory exists
✅ Sanitizes filenames to prevent directory traversal
✅ Checks for existing files before overwriting

🚀 Performance

Performance is appropriate for the use case (processing local PDF files). No concerns.

📝 Minor Suggestions

5. Inconsistent whitespace handling

In rename_invoices.py:66, the sanitization differs from rename_epubs.py:66:

This PR: name.strip() only
rename_epubs.py: re.sub(r"\s+", " ", name) then name.strip()

The epub version normalizes multiple spaces to single spaces, which is more robust. Consider applying the same pattern here for consistency.

📋 Summary

Category	Status
Code Quality	✅ Good
Consistency	✅ Follows repo patterns
Bugs	⚠️ 1 minor regex bug (easily fixed)
Edge Cases	⚠️ Document company name format assumption
Security	✅ No concerns
Performance	✅ Appropriate

Recommended Actions

Must fix:

Fix the regex inconsistency in the date prefix detection (issue Create index.html with links to HTML files #1)

Nice to have:

Add "already named correctly" message handling (issue Create GitHub Pages build workflow #2)
Normalize whitespace in sanitize_filename() for consistency (issue Add CREATE TABLE formatting support #5)
Document the single-word company name requirement or enhance parsing

Despite these issues, the code is production-ready with the regex fix applied. Great work maintaining consistency with the existing codebase! 🎉

https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

claude · 2026-01-27T17:37:07Z

Pull Request Review

Thanks for this contribution! I've reviewed the PR and have the following feedback:

✅ Strengths

Consistent with existing patterns: The script follows the same structure as rename_epubs.py (Click framework, dry-run flag, similar output formatting)
Good error handling: Properly handles edge cases like duplicate files, unexpected formats, and malformed filenames
Cross-platform compatibility: Falls back to st_mtime when st_birthtime is unavailable
Clear documentation: README entry and docstrings are helpful

🐛 Bug: Date Prefix Stripping Logic

Location: python/rename_invoices.py:66-67

This logic strips existing date prefixes but then re-adds them based on current file creation time. This creates a problem where files already correctly named won't match new_path == pdf_path because the stem was modified. If file creation dates change (e.g., after copying), it could incorrectly rename already-correct files.

Recommendation: Remove the date-stripping logic (lines 65-67) and add an early check like rename_epubs.py:95-97 before the duplicate check.

🔧 Minor Issues

Filename parsing assumption (python/rename_invoices.py:69-73): Assumes company name is a single word. If input is "Big Company invoice 123.pdf", it parses company="Big" and invoice="Company invoice 123". Document this assumption or enhance parsing.
Inconsistency with sanitize_filename: rename_epubs.py normalizes multiple spaces but rename_invoices.py doesn't. Add name = re.sub(r"\s+", " ", name) for consistency.

🎯 Code Quality

Type hints, function decomposition, and PEP 723 compliance all look good!

🔒 Security & Performance

No concerns identified.

Summary

Solid addition! Please address the date prefix stripping bug and consider the minor improvements.

🤖 Review by Claude Code

- Skip files that already have yyyy-mm prefix instead of re-processing - Add early check for files already named correctly before duplicate check https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

claude · 2026-01-27T17:45:18Z

PR Review: Add invoice PDF renaming utility script

Overall Assessment

Good implementation that follows the established patterns in the repository. The code is clean, well-documented, and includes proper error handling. However, there are a few issues that should be addressed before merging.

Issues Found

1. Critical: Missing duplicate filename handling in rename_epubs.py pattern ⚠️

The new script handles duplicate target filenames by skipping them, but rename_epubs.py doesn't check for this case. If two different epub files have the same author/title metadata, the second rename will fail with a confusing error. This is actually a bug in the existing script that this PR inadvertently highlights.

Recommendation: This PR's approach is correct. Consider fixing rename_epubs.py in a follow-up PR to match this pattern.

2. Bug: Inconsistent sanitize_filename behavior

The sanitize_filename() function differs between the two scripts:

rename_invoices.py (line 32-37):

def sanitize_filename(name: str) -> str:
    """Remove characters that are invalid in filenames."""
    invalid_chars = '<>:"/\\|?*'
    for char in invalid_chars:
        name = name.replace(char, "")
    return name.strip()

rename_epubs.py (line 61-67):

def sanitize_filename(name: str) -> str:
    """Remove characters that are invalid in filenames."""
    invalid_chars = '<>:"/\\|?*'
    for char in invalid_chars:
        name = name.replace(char, "")
    name = re.sub(r"\s+", " ", name)  # <-- Missing in rename_invoices.py
    return name.strip()

The epub version normalizes multiple whitespace characters to a single space, but the invoice version doesn't. This could lead to filenames with excessive whitespace if the company name or invoice number has multiple consecutive spaces.

Recommendation: Add the whitespace normalization line to rename_invoices.py.

3. Potential issue: Filename parsing assumption

Line 71 uses stem.split(maxsplit=1) which assumes the company name is a single word. This means:

✅ Acme 12345.pdf → works
❌ Acme Corp 12345.pdf → fails (interprets "Acme" as company, "Corp 12345" as invoice number)

Recommendation: Document this limitation clearly in the docstring and consider adding an example of expected input format.

4. Minor: Platform compatibility note

Line 21 uses st_birthtime with a fallback to st_mtime. The comment says "fall back to mtime" but st_birthtime is only available on macOS and some BSD systems. On Linux, this will always use st_mtime (modification time), which may not accurately represent when the invoice was created if the file was copied or modified.

Recommendation: Consider documenting this behavior in the docstring or README.

Code Quality ✅

Strengths:

✅ Follows PEP 723 inline dependency specification correctly
✅ Consistent with existing rename_epubs.py pattern
✅ Good error handling for malformed filenames and edge cases
✅ Clear user feedback with counts
✅ Proper use of Click framework
✅ Type hints on function signatures
✅ Skip logic for already-processed files
✅ Dry-run functionality implemented correctly

Style:

✅ Clean, readable code
✅ Good function decomposition
✅ Appropriate docstrings

Security Considerations ✅

✅ Sanitizes filenames to prevent path traversal or invalid characters
✅ No arbitrary code execution risks
✅ Proper use of Path objects instead of string concatenation

Performance Considerations ✅

✅ Efficient for typical use cases (small number of PDFs)
✅ No unnecessary file I/O

Documentation ✅

✅ README.md updated appropriately
✅ Usage examples provided
✅ Follows existing documentation pattern

Recommendations Summary

Must fix before merge:

Add whitespace normalization to sanitize_filename() to match rename_epubs.py

Should consider:
2. Document the single-word company name limitation
3. Add a note about file date metadata behavior across platforms

Follow-up work:
4. Consider extracting sanitize_filename() to a shared utility module to maintain consistency
5. Add duplicate filename handling to rename_epubs.py in a separate PR

Overall, this is a solid PR with good code quality. The main issue is the missing whitespace normalization. Once addressed, this should be good to merge!

claude added 2 commits January 27, 2026 17:22

Add invoice PDF rename tool

7a21e59

Extracts dates from PDF content and renames invoice files to 'yyyy-mm company_name invoice_number.pdf' format with formatted invoice numbers (whitespace removed, / converted to -). https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

Simplify invoice rename to use file creation date

a8278d2

Remove pypdf dependency - use file metadata instead of parsing PDF content. https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

Fix inconsistent regex pattern for date prefix detection

d0f7cf9

https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

Skip files with date prefix, add already-correct check

8d663f5

- Skip files that already have yyyy-mm prefix instead of re-processing - Add early check for files already named correctly before duplicate check https://claude.ai/code/session_01AgLRG2yeADzgzH9VjTPUW2

kzuraw merged commit 0558145 into main Jan 27, 2026
2 checks passed

kzuraw deleted the claude/rename-invoice-pdfs-6Vk4c branch January 27, 2026 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add invoice PDF renaming utility script#37

Add invoice PDF renaming utility script#37
kzuraw merged 5 commits intomainfrom
claude/rename-invoice-pdfs-6Vk4c

kzuraw commented Jan 27, 2026

Uh oh!

claude bot commented Jan 27, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 27, 2026

Uh oh!

claude bot commented Jan 27, 2026

Uh oh!

claude bot commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kzuraw commented Jan 27, 2026

Summary

Changes

Implementation Details

Uh oh!

claude bot commented Jan 27, 2026

Code Review for PR #37: Add invoice PDF renaming utility script

✅ Strengths

🐛 Issues & Bugs

Critical: Missing backslash in sanitize_filename (python/rename_invoices.py:35)

Bug: Missing continue statement (python/rename_invoices.py:66-67)

Potential Issue: Company name parsing too simplistic (python/rename_invoices.py:71-76)

🔍 Code Quality & Best Practices

🚀 Performance

🔒 Security

📝 Suggestions

Summary

Uh oh!

cloudflare-workers-and-pages bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying tools with Cloudflare Pages

Uh oh!

claude bot commented Jan 27, 2026

Pull Request Review

✅ Strengths

🐛 Potential Bugs

⚠️ Edge Cases to Consider

🔒 Security

🚀 Performance

📝 Minor Suggestions

📋 Summary

Recommended Actions

Uh oh!

claude bot commented Jan 27, 2026

Pull Request Review

✅ Strengths

🐛 Bug: Date Prefix Stripping Logic

🔧 Minor Issues

🎯 Code Quality

🔒 Security & Performance

Summary

Uh oh!

claude bot commented Jan 27, 2026

PR Review: Add invoice PDF renaming utility script

Overall Assessment

Issues Found

1. Critical: Missing duplicate filename handling in rename_epubs.py pattern ⚠️

2. Bug: Inconsistent sanitize_filename behavior

3. Potential issue: Filename parsing assumption

4. Minor: Platform compatibility note

Code Quality ✅

Strengths:

Style:

Security Considerations ✅

Performance Considerations ✅

Documentation ✅

Recommendations Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages bot commented Jan 27, 2026 •

edited

Loading