This project automates the extraction of critical transaction details—Terminal ID, STAN (System Trace Audit Number), and RRN (Reference Retrieval Number)—from financial receipts and statements using EasyOCR and PDF2Image.
✅ Convert PDFs to images for OCR processing
✅ Extract structured text using EasyOCR
✅ Preprocess and clean extracted data to improve accuracy
✅ Use Regular Expressions (Regex) to retrieve key transaction details
✅ Print extracted text and transaction details for debugging
✅ Lightweight and efficient Python-based implementation
- Python (OS, Regex, PIL, Logging)
- EasyOCR (Deep-learning-based OCR for text extraction)
- PDF2Image (Convert PDFs into images for OCR processing)
- Regular Expressions (Extract structured transaction details)
Ensure you have Python installed. Then, install the required dependencies:
pip install easyocr pdf2image pillow- Place your PDF receipt in the project folder.
- Run the script to extract transaction details:
python utils.py <path_to_pdf>
- View extracted text and transaction details in the terminal.
🎥 Watch full demo on LinkendIn
- ✅ NLP-based text correction for improved accuracy
- ✅ Multi-language support for receipts from various regions
- ✅ Web interface for real-time transaction processing using Django
Pull requests are welcome! If you’d like to contribute, please fork the repository and submit a PR.
🚀 Let’s connect! If you find this project useful, feel free to star ⭐ the repo and connect with me on LinkedIn.

