Exploratory Data Analysis (EDA) of Dataset

Author: Tanmay Roy

This project involves performing Exploratory Data Analysis (EDA) on a given dataset, where we aim to clean, process, and visualize the data to uncover valuable insights. The steps followed in this analysis are outlined below:

1. Import Necessary Libraries

We begin by importing all the essential libraries needed for EDA, such as pandas, numpy, and matplotlib.

2. Load the Dataset

The dataset is loaded into a pandas DataFrame. We then display the top 5 and last 5 rows to get an overview of the data.

3. Check Data Types

We check the data types of each column in the dataset to ensure correct interpretation for analysis.

4. Drop Irrelevant Columns

Irrelevant columns like Engine Fuel Type, Market Category, Vehicle Style, Popularity, Number of Doors, and Vehicle Size are dropped as they are not required for analysis.

5. Rename Columns

To simplify the column names, we rename certain columns:

Engine HP → HP
Engine Cylinders → Cylinders
Transmission Type → Transmission
Driven_Wheels → Drive Mode
highway MPG → MPG-H
city mpg → MPG-C
MSRP → Price

6. Drop Duplicate Rows

We display the original shape of the dataset, check for duplicate rows, and then drop them to ensure data integrity.

7. Statistical Summary

We calculate and display important statistics for all numerical columns, including sum, mean, standard deviation, minimum, percentiles, and maximum values.

8. Handle Missing Values

We display the sum of missing or null values for each column, drop rows with missing data, and confirm the cleaning process by checking again.

9. Data Visualization

Various plots are created to analyze the dataset:

Horsepower (HP) vs Price
Sales by Year
Number of Cars in Each Year
Preferred Drive Mode (most common Drive Mode)
Highway MPG (MPG-H) vs City MPG (MPG-C)
Transmission Type vs MPG-H and MPG-C

10. Correlation Matrix and Heatmap

We filter the numeric columns, calculate the correlation matrix, and plot a heatmap to visualize the relationships between variables.

11. Extra Credits

Additional insights are explored, with creative and experimental analysis to uncover further patterns in the data.

Feel free to explore the dataset and the analysis further to gain deeper insights.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
EDA_Assignment_Final.ipynb		EDA_Assignment_Final.ipynb
README.md		README.md
data.csv		data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploratory Data Analysis (EDA) of Dataset

Author: Tanmay Roy

1. Import Necessary Libraries

2. Load the Dataset

3. Check Data Types

4. Drop Irrelevant Columns

5. Rename Columns

6. Drop Duplicate Rows

7. Statistical Summary

8. Handle Missing Values

9. Data Visualization

10. Correlation Matrix and Heatmap

11. Extra Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis (EDA) of Dataset

Author: Tanmay Roy

1. Import Necessary Libraries

2. Load the Dataset

3. Check Data Types

4. Drop Irrelevant Columns

5. Rename Columns

6. Drop Duplicate Rows

7. Statistical Summary

8. Handle Missing Values

9. Data Visualization

10. Correlation Matrix and Heatmap

11. Extra Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages