Skip to content

SPACESODA/pdf2txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf2txt: PDF to Text Converter

Convert PDFs into clean, LLM-ready text.

pdf2txt runs entirely in your browser, so your files never leave your device. It extracts readable text from PDFs and cleans common artifacts like hyphenation and broken lines while preserving headings and structure when possible, making the output ready for LLMs.

✨ Features

  • LLM-ready Output: Automatically cleans up text by:
    • Removing split words caused by hyphenation.
    • Merging hard-wrapped lines while preserving paragraphs.
    • Detecting and formatting headers.
    • Removing page numbers (basic detection).
  • Fast Batch Processing: Drag and drop multiple PDFs to convert them all at once.
  • Password-protected PDFs: Unlock files with a password and convert them locally.
  • Bulk Download: Download all converted files as a single ZIP archive.

License

This project is released under the MIT License.