This project aims to predict house prices in Ames, Iowa, based on a dataset with 79 explanatory variables.
It demonstrates end-to-end data science workflow: from data exploration to feature engineering, model training, evaluation, and stacking.
On Kaggle, this solution ranked in the Top 25% Leaderboard.
- Source: Kaggle Ames Housing Dataset
- 1460 observations, 79 features (categorical, ordinal, and continuous).
- Target:
SalePrice.
- Exploratory Data Analysis (EDA) β distributions, correlations, missing values.
- Feature Engineering β handling NAs, log transforms, encoding categorical variables, scaling.
- Modeling β regression models (Linear Regression, Ridge, Lasso, Random Forest, XGBoost, LightGBM).
- Stacking β ensemble with Ridge as meta-model.
- Evaluation β RMSE, RΒ², cross-validation.
- Baseline (Linear Regression): RMSE ~0.21 (log error).
- Tree-based models (XGBoost, LightGBM): improved performance significantly.
- Final stacking (Ridge meta-model): Top 25% Kaggle Leaderboard.
kaggle-house/ βββ README.md βββ requirements.txt βββ LICENSE βββ .gitignore β βββ data/ β βββ raw/ # Original Kaggle dataset β βββ processed/ # Cleaned & feature-engineered dataset β βββ notebooks/ β βββ 01_EDA.ipynb β βββ 02_FeatureEng.ipynb β βββ 03_Modeling.ipynb β βββ 04_Evaluation.ipynb β βββ 05_Stacking.ipynb β βββ src/ β βββ data_prep.py β βββ train.py β βββ evaluate.py β βββ models/ β βββ baseline_model.pkl β βββ final_model.pkl β βββ model_card.md β βββ reports/ βββ figures/
git clone https://github.com/driksey/ames-housing-price-prediction.git
cd ames-housing-price-prediction
pip install -r requirements.txt