Here’s your improved GitHub README with proper # and ## headings:
Welcome to my House Price Prediction Model, which estimates housing prices in Bangalore based on location and other key features. This project applies data preprocessing, feature engineering, and machine learning to build an efficient predictive model.
This project involves data cleaning, exploratory data analysis (EDA), feature selection, and model training, culminating in a Streamlit-based web app for user-friendly predictions.
- Downloaded the "Bengaluru_House_Data.csv" dataset.
- Initial dataset contained the following columns:
area_type, availability, location, size, society, total_sqft, bath, balcony, price - Removed irrelevant columns:
area_type,society,balcony, andavailability(as they had minimal impact on pricing). - Handled missing values by removing rows with null values (as they were relatively small in number).
- Standardized the 'size' column by converting values into BHK (removing text variations like "Bedroom" and "BHK").
- Reduced dataset dimensionality by removing locations appearing fewer than 10 times.
- Removed extreme price per square foot outliers (below ₹300/sqft).
- Visualized data using scatter plots and histograms to confirm outlier removal.
- Removed unnecessary 'size' and 'price_per_sqft' columns.
- Created dummy variables for categorical 'location' feature.
- Defined 'price' as the target variable and remaining features as inputs.
- Train-Test Split: 80-20 ratio.
- Initial Model: Linear Regression → Achieved 84% accuracy.
- Applied Cross-Validation (
cross_val_score) and obtained results:
0.8243, 0.7716, 0.8508, 0.8083, 0.8365(Average: 81%) - Hyperparameter Tuning: Used GridSearchCV to evaluate multiple models (
Linear Regression,Lasso,Decision Tree Regressor). - Best Model: Linear Regression with an improved accuracy of 88%.
- Developed an interactive web app using Streamlit for easy user access and predictions.
- The final model achieves 88% accuracy after hyperparameter tuning.
- Predicts housing prices efficiently based on user inputs.
- Enables real-time exploration of Bangalore’s housing market trends.
- Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn)
- Machine Learning (Linear Regression, Decision Trees, Lasso)
- Feature Engineering (One-Hot Encoding, Outlier Removal)
- Streamlit (For Deployment)
Feel free to explore the repository and contribute! 🚀
- Devam Singh 😊