Skip to content

24kTanmay/eda-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis (EDA) of Dataset

Author: Tanmay Roy

This project involves performing Exploratory Data Analysis (EDA) on a given dataset, where we aim to clean, process, and visualize the data to uncover valuable insights. The steps followed in this analysis are outlined below:

1. Import Necessary Libraries

We begin by importing all the essential libraries needed for EDA, such as pandas, numpy, and matplotlib.

2. Load the Dataset

The dataset is loaded into a pandas DataFrame. We then display the top 5 and last 5 rows to get an overview of the data.

3. Check Data Types

We check the data types of each column in the dataset to ensure correct interpretation for analysis.

4. Drop Irrelevant Columns

Irrelevant columns like Engine Fuel Type, Market Category, Vehicle Style, Popularity, Number of Doors, and Vehicle Size are dropped as they are not required for analysis.

5. Rename Columns

To simplify the column names, we rename certain columns:

  • Engine HPHP
  • Engine CylindersCylinders
  • Transmission TypeTransmission
  • Driven_WheelsDrive Mode
  • highway MPGMPG-H
  • city mpgMPG-C
  • MSRPPrice

6. Drop Duplicate Rows

We display the original shape of the dataset, check for duplicate rows, and then drop them to ensure data integrity.

7. Statistical Summary

We calculate and display important statistics for all numerical columns, including sum, mean, standard deviation, minimum, percentiles, and maximum values.

8. Handle Missing Values

We display the sum of missing or null values for each column, drop rows with missing data, and confirm the cleaning process by checking again.

9. Data Visualization

Various plots are created to analyze the dataset:

  • Horsepower (HP) vs Price
  • Sales by Year
  • Number of Cars in Each Year
  • Preferred Drive Mode (most common Drive Mode)
  • Highway MPG (MPG-H) vs City MPG (MPG-C)
  • Transmission Type vs MPG-H and MPG-C

10. Correlation Matrix and Heatmap

We filter the numeric columns, calculate the correlation matrix, and plot a heatmap to visualize the relationships between variables.

11. Extra Credits

Additional insights are explored, with creative and experimental analysis to uncover further patterns in the data.

Feel free to explore the dataset and the analysis further to gain deeper insights.

About

This repo is the part of weak 1 assignment of my EEA project Grid Intelligence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors