This repository contains the Supplementary Material for the book "Applied Machine Learning with Python", written by Andrea Giussani.
You can find details about the book on the BUP website.
The books was written with the following specific versions of some popular libraries:
- scikit-learn version 1.2.2
- pandas version 1.5.3
- xgboost version 1.7.4
- gensim version 3.8.1
- matplotlib version 3.7.1
- seaborn version 0.9.0
The book provides a book-specific module, called egeaML.
Be sure you have created a virtualenv. Then run
pip install egeaMLOnce installed you can load a structured label dataset - such as the well-known Boston dataset -
as a pandas.DataFrame, as follows:
from egeaML.datareader import DataReader
raw_data = DataReader(
filename='https://raw.githubusercontent.com/andreagiussani/datasets/master/egeaML/boston.csv',
col_target='MEDV'
)Please noe that the base code is evolving over time; in case you want to stick to the print version of the book,
be sure you install the egeaML==0.2.3 version.
Please, clone on your local machine this repo, as follows:
git clone https://github.com/andreagiussani/Applied_Machine_Learning_with_Python.gitTo install it into your local env, I recommend to create a virtualenv where you add the necessary requirements, running this command from your favourite terminal emulator:
pip install -r requirements.txt
pip install git+https://github.com/andreagiussani/Applied_Machine_Learning_with_Python.gitIf, instead, you use the Anaconda system:
conda install --file requirements.txt
conda install git+https://github.com/andreagiussani/Applied_Machine_Learning_with_Python.gitIf you have Python3 already installed in your local environment, you can run:
python3 -m pip install --upgrade pip
python3 -m pip install git+https://github.com/andreagiussani/Applied_Machine_Learning_with_Python.gitAs a developer, you should unittest your contribution.
To do so, you simply need to create a dedicated folder inside the tests subfolder (or possibly extend an existing one),
and test that your method exactly does what you expect. Please look at the following example to tke inspiration:
import unittest
import os
import pandas as pd
from egeaML.datareader import DataReader
class DataIngestionTestCase(unittest.TestCase):
URL_STRING_NAME = 'https://raw.githubusercontent.com/andreagiussani/datasets/master/egeaML'
FILENAME_STRING_NAME = 'boston.csv'
def setUp(self):
self.col_target = 'MEDV'
self.filename = os.path.join(self.URL_STRING_NAME, self.FILENAME_STRING_NAME)
self.columns = [
'CRIM', 'ZN', 'INDUS', 'CHAS', 'NX', 'RM', 'AGE',
'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV'
]
self.raw_data = DataReader(filename=self.filename, col_target=self.col_target)
def test__load_dataframe(self):
df = self.raw_data()
self.assertIsInstance(df, pd.DataFrame)
self.assertEqual(df.shape[0], 506)
self.assertEqual(df.shape[1], 14)The above unittest checks that the output is of type pandas.DataFrame and
verify the expected output satisfies some characteristics.
If you wish to use the egeaML library on a Jupyter notebook, you firstly need to install the jupyter library,
and then running the following command
pip install jupyter
python3 -m ipykernel install --user --name=<YOUR_ENV>where the name is the name you have assigned to your local environment. You are now ready to use all the feature of this helper!
If you have errata for the book, please submit them via the BUP website. In case of possible mistakes within the book-specific module, you can submit a fixed-version as a pull-request in this repository.
@book{giussani2020,
TITLE="Applied Machine Learning with Python",
AUTHOR="Andrea Giussani",
YEAR="2020",
PUBLISHER="Bocconi University Press"
}