GitHub

Overview

This project focuses on preprocessing receipt images to enhance text extraction quality. The pipeline is designed to improve image clarity and contrast before running OCR or text extraction techniques. Tesseract OCR was dropped in favor of AWS Textract, and the final text extraction pipeline is documented in the ReceiptProcessing repository.

Repository Contents

1. PreProcessFinal.ipynb

This notebook handles:

Loading and processing receipt images
Applying transformations to enhance text visibility
Saving processed images for further text extraction

Key Highlights

Image preprocessing techniques for improved text extraction
Python-driven approach with automation capabilities
Transition from Tesseract OCR to AWS Textract for better results
See ReceiptProcessing repository for updated receipt text extraction

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
assets		assets
PreProcessFinal.ipynb		PreProcessFinal.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Repository Contents

1. PreProcessFinal.ipynb

Key Highlights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Callowlock/PineappleExpenseBackend

Folders and files

Latest commit

History

Repository files navigation

Overview

Repository Contents

1. PreProcessFinal.ipynb

Key Highlights

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages