-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcopy_of_task2.py
More file actions
117 lines (72 loc) · 3.28 KB
/
copy_of_task2.py
File metadata and controls
117 lines (72 loc) · 3.28 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# -*- coding: utf-8 -*-
"""Copy of Task2.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1R8ZkC-YCyg1lFNAR0DIYo2O4ewNMzujJ
#Task 2
Run the following cell to import your google drive
"""
from google.colab import drive
drive.mount("/content/gdrive")
"""Run the following cell to import the necessary libraries"""
import pandas as pd
import matplotlib.pyplot as plt
"""## Task 2.1
For aspiring data scientists and machine learning enthusiasts, it is vital that they understand the coding environment. We normally use a jupyter notebook for most of our programming. To understand its importance, lets perform a small exercise.
We are given 2 datasets, they have data of certain students from 2 different branches and their respective divisions.
Download the datasets from the links given below -
Task2p1_0 - https://drive.google.com/file/d/1sCGCErnf2iC_qKUHcGLrt2O2HwuiwZYD/view?usp=sharing
Task2p1_1 - https://drive.google.com/file/d/1Kb_TcAEfsUGi-KPZpsh8pXBaxfs4AsHh/view?usp=sharing
Create a folder named 'Task2p1' in your google drive and upload the datasets 'Task2p1_0' and 'Task2p1_1' there.
###Python Script
"""
data = pd.read_csv("/content/gdrive/MyDrive/Task2p1/Task2p1_0.csv")
data
data = pd.read_csv("/content/gdrive/MyDrive/Task2p1/Task2p1_1.csv")
data
"""###Jupyter Notebook"""
data = pd.read_csv("/content/gdrive/MyDrive/Task2p1/Task2p1_0.csv")
data
data = pd.read_csv("/content/gdrive/MyDrive/Task2p1/Task2p1_1.csv")
data
"""Note the difference between the outputs you get after using the code in cell 3 as a python script and after running it line by line in a jupyter notebook.
*Double click here and write your answer in the editor that shows up*
##Task 2.2
"""
from google.colab import drive
drive.mount('/content/drive')
"""Download the dataset of fifty random Harry Potter spells from the link given below -
https://drive.google.com/file/d/1F2YEtVZaorL0WPGxOC5vXkMSixuYkI4v/view?usp=sharing
Create a folder named 'Task2hp' in your google drive and upload the dataset there.
Observe the dataset, it is a .csv file. Using pandas library, read the .csv file into a dataframe in the next cell.
Tip : You can make use of the Files tab in the side bar on the left to navigate through your file structure.
"""
import pandas as pd
hp_df = pd.read_csv('/content/drive/MyDrive/Task2hp/HPSpells.csv')
"""Display first five rows of the dataframe."""
hp_df.head()
"""Remove all the rows having type as 'Hex'"""
hp_df.info()
hp_df.Type
hp_df.drop(hp_df[hp_df.Type=='Hex'].index, inplace=True)
"""Now remove the rows from the dataframe having NaN values for column 'Light'."""
hp_df.Light.isna().sum()
hp_df.dropna(subset=['Light'],inplace=True)
"""Filter out the spells on the basis of their Type and make a bar graph showing your analysis of the number of spells and their distribution in the five types - Charm, Jinx, Transfiguration, Curse."""
import matplotlib.pyplot as plt
import numpy as np
names=hp_df.Type.unique()
freq={}
for name in names:
i=0
for item in hp_df.Type:
if item==name:
i=i+1
freq[name]=i
plt.figure(figsize=(20,7))
plt.bar(range(len(freq)),list(freq.values()))
plt.xticks(range(len(freq)),names)
plt.xlabel('Types of spells')
plt.ylabel('Number of spells')
plt.title('Spells Distribution')
plt.show()