The PAN Validation Project automates the process of validating Permanent Account Numbers (PAN) using SQL and Python, and now includes a Streamlit-based web app for easy, interactive validation.
A PAN (Permanent Account Number) is a unique 10-character alphanumeric identifier issued by the Income Tax Department of India. Validating PANs is critical for ensuring data integrity in financial transactions, tax compliance, and KYC processes.
This project provides:
- Two complementary backend approaches — one using Microsoft SQL Server and another using Python (Pandas).
- A Streamlit frontend that allows users to validate PANs instantly or upload Excel files for bulk validation and visualization.
- ✅ Validate PAN numbers based on official structural and logical rules.
- ✅ Perform data cleaning, validation, and reporting using both SQL and Python.
- ✅ Compare performance and flexibility between SQL-based and Python-based solutions.
- ✅ Provide a user-friendly Streamlit interface for instant and batch validation.
- ✅ Generate reports summarizing valid and invalid PANs visually and in Excel format.
The project now includes three components:
1️⃣ SQL Implementation
2️⃣ Python Script Implementation
3️⃣ Streamlit Web App - Link
| Component | Technology / Tool |
|---|---|
| Database | Microsoft SQL Server 2025 |
| Languages | SQL, Python |
| Python Libraries | pandas, re, openpyxl |
| IDE / Notebook | Jupyter Notebook |
-
Data Collection
- Collected dataset from Kaggle containing a single column
PAN_Numbers.
- Collected dataset from Kaggle containing a single column
-
Data Import
- Imported the Excel dataset into Microsoft SQL Server using the Import Wizard or
BULK INSERT.
- Imported the Excel dataset into Microsoft SQL Server using the Import Wizard or
-
Data Cleaning
- Removed
NULLvalues. - Trimmed leading and trailing spaces.
- Removed duplicate PAN entries.
- Removed
-
Data Validation
- Validation rules:
- Must be 10 characters long.
- Format:
AAAAA9999A→ 5 letters + 4 digits + 1 letter. - All letters must be A–Z, uppercase.
- All digits must be 0–9.
- Adjacent characters(alphabets) cannot be the same (like AABCD is invalid; AXBCD is valid)
- All five characters cannot form a sequence (like: ABCDE, BCDEF is invalid; ABCDX is valid)
- Validation rules:
-
Data Reporting
- Created summary reports to display:
- ✅ Valid PANs
- ❌ Invalid PANs
- Created summary reports to display:
-
Environment Setup
- Implemented using Jupyter Notebook and Python 3.x.
-
Data Import
- Imported Excel data using the
pandaslibrary.
- Imported Excel data using the
-
Data Cleaning
- Dropped
NULLor empty rows. - Trimmed white spaces.
- Removed duplicate entries.
- Dropped
-
Data Validation
- Used Regular Expressions (Regex) for validation:
^[A-Z]{5}[0-9]{4}[A-Z]{1}$
- Adjacent characters(alphabets) cannot be the same (like AABCD is invalid; AXBCD is valid)
- All five characters cannot form a sequence (like: ABCDE, BCDEF is invalid; ABCDX is valid)
- Used Regular Expressions (Regex) for validation:
-
Data Reporting
- Created an Excel workbook (
PAN_Validation_Report.xlsx) containing:- Original PAN Numbers
- Validation Status (Valid / Invalid)
- Generated summary statistics for valid and invalid PANs.
- Created an Excel workbook (
| PAN_Numbers | Validation_Status |
|---|---|
| ABCDE1234F | Valid |
| AB1234CDE9 | Invalid |
| PQRST6789L | Valid |
An Excel report named PAN VALIDTION RESULT.xlsx is generated containing:
-
PAN_Number -
Status -
Pie chart visualization:
🟩 Valid: 68%
🟥 Invalid: 32%
🌐 3️⃣ Streamlit Web App
The Streamlit App provides an interactive interface for PAN validation.
-
Single PAN Check:
Enter a PAN number to instantly verify validity. -
Bulk Upload Validation:
Upload an Excel file containing a column of PAN numbers.
The app:- Cleans and validates all PANs.
- Displays a summary table of valid/invalid counts.
- Generates a pie chart showing distribution.
- Provides a downloadable Excel file with each PAN’s validation status.
- Summary table showing counts of valid and invalid PANs.
- Pie chart visualization (
Validvs.Invalid). - Downloadable Excel file with columns:
PAN_NumberValidation_Status
This project demonstrates an end-to-end data validation pipeline — from backend validation in SQL and Python to a user-friendly web application built with Streamlit.
- Automated PAN validation using both backend and frontend approaches.
- Improved data integrity through structured cleaning and verification.
- Generated ready-to-use reports for auditing and analysis.
- Interactive dashboard for quick insights and downloads.
- KYC (Know Your Customer) verification systems.
- Tax & compliance validation tools.
- CRM or enterprise data quality improvement.
- Integration with ETL and data ingestion processes.
https://drive.google.com/file/d/1bfYavwEfg9WDDkv4Lg9US4OhPL-GiUwx/view?usp=sharing