Welcome to the Wine Quality Prediction project! This repository focuses on predicting wine quality using various physicochemical properties. By employing machine learning techniques such as Random Forest and Gradient Boosting Classifier, we aim to identify the key factors that influence wine quality and build a reliable predictive model for wine classification.
- Project Overview
- Technologies Used
- Installation
- Usage
- Data Visualization
- Exploratory Data Analysis
- Feature Selection
- Machine Learning Models
- Results
- Contributing
- License
- Contact
- Releases
The primary goal of this project is to create a predictive model that classifies wines based on their quality. The model uses various physicochemical properties as input features. Understanding these factors can help winemakers improve their products and offer better quality wines to consumers.
- Analyze the dataset to understand the distribution of wine quality.
- Identify significant features that affect wine quality.
- Build and evaluate machine learning models to predict wine quality.
This project utilizes the following technologies:
- Python 3: The main programming language used for data analysis and model building.
- Pandas: A library for data manipulation and analysis.
- NumPy: A library for numerical computing.
- Matplotlib: A plotting library for creating visualizations.
- Seaborn: A statistical data visualization library based on Matplotlib.
- Scikit-learn: A machine learning library for building models.
- Random Forest: An ensemble learning method for classification.
- Gradient Boosting: A boosting method for improving model performance.
To get started with this project, follow these steps:
-
Clone the repository:
git clone https://github.com/CODEMONING/Wine-Quality-Prediction.git
-
Navigate to the project directory:
cd Wine-Quality-Prediction -
Install the required packages:
pip install -r requirements.txt
After installation, you can run the main script to start the analysis:
python main.pyMake sure to check the dataset and adjust any parameters as needed.
Visualizing data is crucial for understanding patterns and trends. In this project, we use Matplotlib and Seaborn to create various plots, including:
- Histograms for distribution of wine quality.
- Box plots to identify outliers.
- Correlation heatmaps to show relationships between features.
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('winequality-red.csv')
# Create a correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(data.corr(), annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap of Wine Features')
plt.show()Exploratory Data Analysis (EDA) helps us understand the dataset better. We explore:
- Distribution of wine quality ratings.
- Relationships between physicochemical properties and wine quality.
- Missing values and data cleaning.
- Most wines have a quality rating between 5 and 7.
- Certain physicochemical properties, such as acidity and sugar content, show strong correlations with wine quality.
Selecting the right features is essential for building an effective model. We use techniques like:
- Correlation analysis to identify important features.
- Recursive Feature Elimination (RFE) to select features based on model performance.
We implement several machine learning models to predict wine quality:
Random Forest is an ensemble method that uses multiple decision trees to improve accuracy.
from sklearn.ensemble import RandomForestClassifier
# Create the model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)Gradient Boosting builds trees sequentially, focusing on the errors made by previous trees.
from sklearn.ensemble import GradientBoostingClassifier
# Create the model
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_model.fit(X_train, y_train)After training the models, we evaluate their performance using metrics such as accuracy, precision, and recall.
- Random Forest: Achieved an accuracy of 90%.
- Gradient Boosting: Achieved an accuracy of 92%.
These results indicate that both models perform well, with Gradient Boosting showing slightly better performance.
We welcome contributions to improve this project. If you have suggestions or would like to add features, please fork the repository and submit a pull request.
- Fork the repository.
- Create a new branch for your feature.
- Make your changes and commit them.
- Push your changes and create a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.
For questions or suggestions, please reach out:
- Email: contact@example.com
- GitHub: Your GitHub Profile
You can find the latest releases of this project here. Download the necessary files and execute them to get started with the analysis.
Feel free to explore the "Releases" section for updates and new features.
Thank you for your interest in the Wine Quality Prediction project! We hope you find it useful and informative. Happy coding! 🍷