Skip to content

SmartDataCleaner v2.0.0 – Professional Python desktop application for automatic CSV data cleanup, type normalization, missing value handling, duplicate removal, and heuristic data quality analysis with export to TXT, JSON, and PDF.

License

Notifications You must be signed in to change notification settings

rogers-cyber/SmartDataCleaner

Repository files navigation

SmartDataCleaner v2.0.0 – Professional Data Cleanup & Preprocessing Tool (Full Source Code)

SmartDataCleaner v2.0.0 is a powerful Python desktop application for automatic data cleanup and preprocessing.
This repository contains the full source code, allowing you to customize data cleaning logic, column normalization, UI behavior, and export formats for analytics, research, or machine learning workflows.


🌟 SCREENSHOT

FileScope Main Interface


🌟 FEATURES

  • 📂 CSV File Input — Load any CSV file for processing
  • 🧹 Automatic Data Cleanup — Handles missing values, duplicates, and type inconsistencies
  • 🧠 Heuristic Scoring Engine — Detects high-risk columns based on data quality
  • 📝 Column Normalization — Suggests snake_case renaming for consistency
  • 🔢 Type Normalization — Numeric and string type corrections
  • 📈 Duplicate Detection & Removal — Automatically identifies repeated entries
  • 🧵 Multithreaded Execution — Responsive UI during large dataset processing
  • 🖱️ Interactive Results Table — Review heuristic scores, rename suggestions, and column stats
  • 📄 Export Results — Save cleanup results as TXT, JSON, or PDF
  • 📑 Professional PDF Reports — Clean formatting with pagination and color-coded heuristics
  • 🎨 Modern Dark UI — Built with Tkinter + ttkbootstrap
  • ⚙️ Fully Customizable — Adjust imputation methods, scoring logic, UI layout, or export behavior
  • 📘 Built-In About / Guide — Usage tips, features, and developer info included
  • 🔒 Local Processing Only — No internet access or data transmission

🚀 INSTALLATION

  1. Clone or download this repository:

git clone https://github.com/rogers-cyber/SmartDataCleaner.git
cd SmartDataCleaner

  1. Install required Python packages:

pip install pandas numpy ttkbootstrap reportlab

(Tkinter is included with standard Python installations.)

  1. Run the application:

python SmartDataCleaner.py

  1. Optional: Build a standalone executable using PyInstaller:

pyinstaller --onefile --windowed SmartDataCleaner.py


💡 USAGE

  1. Select CSV File:

    • Click 📄 CSV File to choose your dataset.
  2. Start Cleanup:

    • Click 🧹 CLEAN DATA
    • Monitor heuristic scores, missing values, duplicates, and type suggestions.
  3. Stop Cleanup:

    • Click 🛑 STOP to safely interrupt processing at any time.
  4. Review Results:

    • Columns are displayed in the results table with:
      • Original type → Cleaned type
      • Missing values count
      • Duplicates removed
      • Heuristic score
      • Suggested rename
  5. Export Results:

    • Click 📃 TXT, 📄 JSON, or 📄 PDF to export cleanup reports
    • Receive confirmation pop-ups after export
  6. About / Guide:

    • Click ℹ About for usage instructions, heuristics overview, and developer info

⚙️ CONFIGURATION OPTIONS

Option Description


CSV File Load a dataset for cleanup Start Cleanup Begin heuristic analysis and automatic cleaning Stop Cleanup Safely halt processing Results Table Interactive view of column stats, scores, and suggestions Export TXT Plain-text report of cleaned data Export JSON Structured cleanup results for automation or analysis Export PDF Professional report with title, pagination, and color-coded scores About / Guide Built-in usage instructions and tool overview


📦 OUTPUT FORMATS

  • TXT — Plain-text report of column stats and suggestions
  • JSON — Structured data for further analysis or automation
  • PDF — Professional report with color-coded heuristic scores, column info, and pagination

📦 DEPENDENCIES

  • Python 3.10+
  • pandas — Data processing and type handling
  • numpy — Numeric computations
  • ttkbootstrap — Modern themed UI
  • reportlab — PDF report generation
  • Tkinter — Standard Python GUI framework
  • Threading — Background cleanup execution
  • OS / Sys — Platform-aware file handling

📝 NOTES

  • SmartDataCleaner performs all processing locally
  • No data is transmitted externally
  • Heuristic scores help prioritize columns needing attention
  • Column renaming suggestions enforce consistent snake_case formatting
  • Missing numeric values are imputed with mean, strings with mode
  • Error logs are written to datacleaner.log
  • Fully portable when compiled as a standalone executable

👤 ABOUT

SmartDataCleaner v2.0.0 is maintained by Mate Technologies, delivering practical Python-based productivity and analytics tools.

Website: https://matetools.gumroad.com


📜 LICENSE

Distributed as commercial source code.
You may use it for personal or commercial projects.
Redistribution, resale, or rebranding as a competing product is not allowed.

About

SmartDataCleaner v2.0.0 – Professional Python desktop application for automatic CSV data cleanup, type normalization, missing value handling, duplicate removal, and heuristic data quality analysis with export to TXT, JSON, and PDF.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages