Skip to content

Conversation

Copy link

Copilot AI commented Nov 22, 2025

Implements automated verification that literature/pdfs/ALL_FLAT/ remains the canonical PDF source and stays in sync with corpus_manifest.json.

Changes

  • verify_pdf_ssot.py: Verification script with 5 checks (completeness both directions, hash integrity, uniqueness, validity)
  • test_verify_pdf_ssot.py: Test suite covering all verification scenarios
  • Makefile: Added verify-ssot and test-ssot targets
  • Documentation: Report showing current status and maintenance guidelines

Findings

Verification identified 1 corrupted PDF (fpls-15-1268101.pdf) - XML error from failed download rather than valid PDF content. All other 113 PDFs validated successfully.

Usage

make verify-ssot          # Run verification
python3 create_manifest.py  # Regenerate manifest after changes

The verification is designed to be integrated into CI/CD or pre-commit hooks to prevent SSOT drift.

Original prompt

verify ssot for pdf sources


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Nov 22, 2025
Copilot AI and others added 4 commits November 22, 2025 10:31
- Add verify_pdf_ssot.py script with 5 verification checks
- Add comprehensive test suite (test_verify_pdf_ssot.py)
- Document findings in SSOT_VERIFICATION_REPORT.md
- Identified 1 corrupted PDF file (fpls-15-1268101.pdf)

Co-authored-by: Camier <8348793+Camier@users.noreply.github.com>
- Add verify-ssot and test-ssot targets to Makefile
- Update README with PDF SSOT documentation and verification instructions

Co-authored-by: Camier <8348793+Camier@users.noreply.github.com>
- Make test PDF content difference more explicit
- Improve hash display for duplicates (first 8 + last 8 chars)

Co-authored-by: Camier <8348793+Camier@users.noreply.github.com>
Co-authored-by: Camier <8348793+Camier@users.noreply.github.com>
Copilot AI changed the title [WIP] Verify single source of truth for PDF sources Add SSOT verification for PDF sources Nov 22, 2025
Copilot AI requested a review from Camier November 22, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants