Skip to content

Shivmalge/PAN_VALIDATION_USING_SQL_PYTHON

Repository files navigation

🧾 PAN Validation Using SQL and Python


📖 Introduction

The PAN Validation Project automates the process of validating Permanent Account Numbers (PAN) using SQL and Python, and now includes a Streamlit-based web app for easy, interactive validation.

A PAN (Permanent Account Number) is a unique 10-character alphanumeric identifier issued by the Income Tax Department of India. Validating PANs is critical for ensuring data integrity in financial transactions, tax compliance, and KYC processes.

This project provides:

  • Two complementary backend approaches — one using Microsoft SQL Server and another using Python (Pandas).
  • A Streamlit frontend that allows users to validate PANs instantly or upload Excel files for bulk validation and visualization.

🎯 Purpose

  • ✅ Validate PAN numbers based on official structural and logical rules.
  • ✅ Perform data cleaning, validation, and reporting using both SQL and Python.
  • ✅ Compare performance and flexibility between SQL-based and Python-based solutions.
  • ✅ Provide a user-friendly Streamlit interface for instant and batch validation.
  • ✅ Generate reports summarizing valid and invalid PANs visually and in Excel format.

🧩 Project Structure

The project now includes three components:

1️⃣ SQL Implementation
2️⃣ Python Script Implementation
3️⃣ Streamlit Web App - Link


⚙️ Technologies Used

Component Technology / Tool
Database Microsoft SQL Server 2025
Languages SQL, Python
Python Libraries pandas, re, openpyxl
IDE / Notebook Jupyter Notebook

🧮 1️⃣ SQL Implementation

Steps

  1. Data Collection

    • Collected dataset from Kaggle containing a single column PAN_Numbers.
  2. Data Import

    • Imported the Excel dataset into Microsoft SQL Server using the Import Wizard or BULK INSERT.
  3. Data Cleaning

    • Removed NULL values.
    • Trimmed leading and trailing spaces.
    • Removed duplicate PAN entries.
  4. Data Validation

    • Validation rules:
      • Must be 10 characters long.
      • Format: AAAAA9999A → 5 letters + 4 digits + 1 letter.
      • All letters must be A–Z, uppercase.
      • All digits must be 0–9.
      • Adjacent characters(alphabets) cannot be the same (like AABCD is invalid; AXBCD is valid)
      • All five characters cannot form a sequence (like: ABCDE, BCDEF is invalid; ABCDX is valid)
  5. Data Reporting

    • Created summary reports to display:
      • ✅ Valid PANs
      • ❌ Invalid PANs

🐍 2️⃣ Python Implementation

Steps

  1. Environment Setup

    • Implemented using Jupyter Notebook and Python 3.x.
  2. Data Import

    • Imported Excel data using the pandas library.
  3. Data Cleaning

    • Dropped NULL or empty rows.
    • Trimmed white spaces.
    • Removed duplicate entries.
  4. Data Validation

    • Used Regular Expressions (Regex) for validation:
      ^[A-Z]{5}[0-9]{4}[A-Z]{1}$
    • Adjacent characters(alphabets) cannot be the same (like AABCD is invalid; AXBCD is valid)
    • All five characters cannot form a sequence (like: ABCDE, BCDEF is invalid; ABCDX is valid)
  5. Data Reporting

    • Created an Excel workbook (PAN_Validation_Report.xlsx) containing:
      • Original PAN Numbers
      • Validation Status (Valid / Invalid)
    • Generated summary statistics for valid and invalid PANs.

📊 Example Output

✅ SQL Result Example

PAN_Numbers Validation_Status
ABCDE1234F Valid
AB1234CDE9 Invalid
PQRST6789L Valid

🧾 Python Result Example

An Excel report named PAN VALIDTION RESULT.xlsx is generated containing:

  • PAN_Number

  • Status

  • Pie chart visualization:

    🟩 Valid: 68%

    🟥 Invalid: 32%


🌐 3️⃣ Streamlit Web App

The Streamlit App provides an interactive interface for PAN validation.

🔧 Features

  • Single PAN Check:
    Enter a PAN number to instantly verify validity.

  • Bulk Upload Validation:
    Upload an Excel file containing a column of PAN numbers.
    The app:

    • Cleans and validates all PANs.
    • Displays a summary table of valid/invalid counts.
    • Generates a pie chart showing distribution.
    • Provides a downloadable Excel file with each PAN’s validation status.

📊 Example Output

  • Summary table showing counts of valid and invalid PANs.
  • Pie chart visualization (Valid vs. Invalid).
  • Downloadable Excel file with columns:
    • PAN_Number
    • Validation_Status

💡 Conclusion & Application

This project demonstrates an end-to-end data validation pipeline — from backend validation in SQL and Python to a user-friendly web application built with Streamlit.

📈 Key Outcomes

  • Automated PAN validation using both backend and frontend approaches.
  • Improved data integrity through structured cleaning and verification.
  • Generated ready-to-use reports for auditing and analysis.
  • Interactive dashboard for quick insights and downloads.

🌍 Real-World Applications

  • KYC (Know Your Customer) verification systems.
  • Tax & compliance validation tools.
  • CRM or enterprise data quality improvement.
  • Integration with ETL and data ingestion processes.

🎥 Web App Demo

https://drive.google.com/file/d/1bfYavwEfg9WDDkv4Lg9US4OhPL-GiUwx/view?usp=sharing


👨‍💻 Developed By - Shivsharan Malage

🌐Github

💼Linkedin

About

PAN VALIDATION Using SQL and PYTHON

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published