This project focuses on preprocessing receipt images to enhance text extraction quality. The pipeline is designed to improve image clarity and contrast before running OCR or text extraction techniques. Tesseract OCR was dropped in favor of AWS Textract, and the final text extraction pipeline is documented in the ReceiptProcessing repository.
This notebook handles:
- Loading and processing receipt images
- Applying transformations to enhance text visibility
- Saving processed images for further text extraction
- Image preprocessing techniques for improved text extraction
- Python-driven approach with automation capabilities
- Transition from Tesseract OCR to AWS Textract for better results
- See ReceiptProcessing repository for updated receipt text extraction