Predicting next-day fine particulate matter (PMβ.β ) concentrations using Machine Learning techniques.
This repository contains code, data workflow, and visualizations from the Kaggle-based notebook for global air quality prediction.
It leverages meteorological and environmental datasets to build high-performance models that forecast PMβ.β
concentration levels across multiple regions worldwide.
Air pollution remains one of the most critical global challenges.
This project develops a data-driven machine learning framework for next-day PMβ.β
forecasting using publicly available datasets.
Key Features:
- Data preprocessing and feature engineering from multi-source air quality datasets
- Model training using tree-based ensemble methods (e.g., XGBoost, LightGBM)
- Performance evaluation with RMSE and RΒ² metrics
- SHAP interpretability for feature impact analysis
- Visual dashboards and plots for spatial-temporal understanding
-
Data Aggregation:
Merging global PMβ.β datasets with weather parameters such as temperature, wind speed, humidity, and pressure. -
Preprocessing & Cleaning:
Handling missing values, scaling features, and temporal alignment. -
Model Development:
Training machine learning regressors like:- XGBoost
- Random Forest
- LightGBM
- Linear Regression
-
Evaluation:
- RMSE, MAE, and RΒ²
- SHAP-based feature importance visualization
-
Forecast Generation:
Produces next-day PMβ.β predictions for multiple global locations.
You can explore the interactive visualizations and full notebook here:
π View the Project Dashboard
β
βββ data/ # Raw and processed datasets (not pushed due to size)
βββ docs/ # HTML outputs for GitHub Pages
β βββ index.html
β βββ notebook.html
β
βββ global-analysis-next-day-pm2-5-ml.ipynb # Kaggle notebook
βββ requirements.txt # Environment dependencies
βββ README.md # Project documentation