Skip to content

abdmath/ai-malware-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ AI Malware Detection System

A complete malware detection & analysis system built using Machine Learning, PE static analysis, and multi-interface scanning tools — designed to scan .exe files for malicious behavior using Random Forest, XGBoost, and ANN models.


🚀 Project Overview

This project uses static PE (Portable Executable) features to train AI models that detect malicious Windows binaries. It supports scanning via:

  • ✅ Command Line Interface (CLI)
  • ✅ Desktop GUI (Tkinter)
  • ✅ Browser App (Streamlit)

Trained on the EMBER 2018 malware dataset with over 600,000 samples, it achieves high accuracy and is structured like a modern malware lab project — designed to align with malware analyst roles like SonicWall’s internship program.


📦 Tech Stack

  • Language: Python 3.10
  • ML Libraries: Scikit-learn, XGBoost, TensorFlow/Keras
  • PE Analysis: LIEF (for static binary feature extraction)
  • UI: Streamlit (Web), Tkinter (GUI)
  • Data: EMBER 2018 dataset (600k malware/goodware samples)

🧠 Machine Learning Models

Model Accuracy Precision Recall F1 Score
Random Forest 96.8% 97.5% 96.1% 96.78%
XGBoost 94.9% 94.5% 95.3% 94.95%
ANN (Keras) 96.4% 96.6% 96.1% 96.39%

🖥️ Features

  • 🔍 .exe binary scanning using PE features
  • 🤖 Trained 3 ML models on 600k sample dataset
  • 🧪 Real-world prediction on unknown .exe files
  • 🖥️ CLI, GUI, and browser-based interfaces
  • 📊 Model comparison and performance graphs
  • ⚙️ Supports .pkl and .h5 model loading
  • 🧰 Easily extendable for Cuckoo Sandbox (dynamic analysis)

📷 Screenshots

confusion_matrix feature_importance model_comparison Screenshot 2025-05-05 012056


📦 Download Data & Models

⚠️ GitHub does not allow uploading files over 100MB.
Please download all large files (models, datasets, binaries) from this Google Drive link:

🔗 📁 Google Drive – Data + Models

What to do after downloading:

  • Unzip ember_data.zip into the project root as: ember_data/
  • Place all .npy files in the root directory
  • Place trained model files (.pkl, .h5) inside models/

🧪 How to Use

▶️ CLI Mode

python src/predict_exe.py "path/to/sample.exe" --model models/random_forest.pkl

💻 Desktop GUI

A simple interface to upload a .exe file and detect if it's malware or benign using your trained models.

▶️ Run the App

python ui/gui_app.py

▶️ Web Browser UI (Streamlit)

streamlit run ui/web_app.py

📁 Folder Structure

AI-Malware-Detector/
│
├── src/                    # ML logic & data handling
│   ├── train_model.py      # Random Forest & XGBoost trainer
│   ├── train_ann.py        # ANN model trainer
│   ├── ember_loader.py     # Loads + processes EMBER data
│   ├── predict_exe.py      # Predicts malware from .exe
│   └── compare_models.py   # Evaluates & visualizes model metrics
│
├── ui/
│   ├── gui_app.py          # Tkinter-based GUI
│   └── web_app.py          # Streamlit browser app
│
├── models/                 # Trained model files (.pkl, .h5)
├── ember_data/             # EMBER dataset (JSONL files)
├── *.npy                   # Preprocessed feature arrays
├── requirements.txt
└── README.md

✨ Future Enhancements

  • Add dynamic analysis via Cuckoo Sandbox (API, registry, behavior)
  • Combine static + dynamic features for hybrid model
  • Deploy web app online (e.g., Streamlit Cloud or Render)
  • Email alert/report for detected threats
  • Real-time dashboard with threat stats

🤝 Credits

About

AI-powered malware detection system that scans .exe files using static PE analysis and machine learning (Random Forest, XGBoost, ANN) with CLI, GUI, and web app support.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages