Skip to content

Conversation

@MALathon
Copy link
Owner

Summary

This PR implements a collection of enhancements to improve the usability and functionality of fetcharoo:

Changes

New Features

Feature API CLI
Sort ordering sort_by='numeric', sort_key=fn --sort-by numeric
Deduplication deduplicate=True (default) N/A (always on)
Output naming output_name='book.pdf' --output-name book.pdf
Rich results Returns ProcessResult N/A
Verbosity Logger fetcharoo -q, -qq, -v, -vv

Files Changed

  • fetcharoo/fetcharoo.py - Core implementation of all features
  • fetcharoo/cli.py - CLI argument parsing and logging configuration
  • fetcharoo/__init__.py - Export new symbols (ProcessResult, SORT_BY_OPTIONS)
  • README.md - Documentation for all new features
  • tests/test_enhancements.py - 32 new tests for all features
  • tests/test_*.py - Updated mocks to use named logger

Test Plan

  • All 244 tests pass (212 existing + 32 new)
  • Sort ordering works with numeric, alpha, alpha_desc, and custom keys
  • Deduplication removes duplicate URLs while preserving order
  • Custom output names are applied to merged PDFs
  • ProcessResult provides accurate download statistics
  • CLI verbosity flags adjust logging levels correctly

Closes

Closes #3, closes #4, closes #5, closes #6, closes #7, closes #8

Closes #3

- Add sort_by parameter with options: 'numeric', 'alpha', 'alpha_desc', 'none'
- Add sort_key parameter for custom sort functions
- 'numeric' extracts numbers from filenames for proper chapter ordering
- Update CLI with --sort-by option
- Export SORT_BY_OPTIONS constant
Closes #4

- Add deduplicate parameter (default True) to remove duplicate PDF URLs
- Track seen PDFs across recursive calls using internal _seen_pdfs set
- Preserves discovery order (first occurrence wins)
Closes #5

- Add output_name parameter to process_pdfs and download_pdfs_from_webpage
- Add --output-name CLI option for merge mode
- Filename is sanitized for security
Closes #6

- Add ProcessResult with success, files_created, downloaded_count, filtered_count, failed_count, errors
- ProcessResult has __bool__ for backward compatibility in boolean contexts
- Update process_pdfs to return ProcessResult
- Update download_pdfs_from_webpage return type
- Export ProcessResult from package
- Use dedicated 'fetcharoo' logger instead of root logger
- Set default log level to WARNING (quiet by default)
- Change verbose "Finding PDFs from" messages to DEBUG level
- Add -q/--quiet flag to CLI to reduce output (-qq for even quieter)
- Add -v/--verbose flag to CLI to increase output (-vv for debug)
- Update tests to mock the named logger

Closes #7, closes #8
- Add comprehensive test suite for enhancements (32 new tests):
  - Sort ordering tests (numeric, alpha, alpha_desc, custom key)
  - Deduplication tests
  - Custom output filename tests
  - ProcessResult dataclass tests
  - CLI verbosity flag tests
- Update README with new features documentation:
  - New CLI options (--sort-by, --output-name, -q, -v)
  - Usage examples for sorting, ProcessResult
  - Updated API reference with new parameters
@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@MALathon MALathon merged commit 784a75f into main Dec 15, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants