Add Drift Detector Benchmarker Prototype by yuji3w · Pull Request #17 · loglabs/ttb

yuji3w · 2022-04-19T03:40:25Z

The objective of the drift detector benchmarker is to create an easy-to-use framework that benchmarks data drift detection methods. The benchmarked currently uses the NYC taxi dataset.

Key features:

Created benchmarking framework, allowing easy testing of the following drift detectors created in this pr:

ClassifierDriftModel,
ClassifierUncertaintyModel,
CVMModel,
FETModel,
LearnedKernelModel,
MMDModel,
ModelWrapperInterface,
SpotTheDiffModel,
TabularModel,

Created graphing utility, allowing for plotting inter-week drift on a feature-wise level

Created accessor class and factory, allows for easy access to data frame data.

Created batch loader for loading large date ranges (especially greater than 1 year).

lib/data_clean_utils.py

model.ipynb

1. Add f1, accuracy scores to CSV 2. Aggregate CSVs 3. Track runtimes for each method 4. All methods now ready for use

review-notebook-app · 2022-04-25T21:28:09Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

`tip_amount` leaks information to `tip_percent_greater_15`

shreyashankar · 2022-04-27T21:48:52Z

lib/data_clean_utils.py

-    df.loc[df['vendorid'] == '2', 'vendorid'] = 2
-    df.loc[df['vendorid'] == '2.0', 'vendorid'] = 2
+    # Correct string type confusion in vendorid, payment_type
+    df['vendorid'] = df['vendorid'].astype(float).astype(int)


Why astype(float) before astype(int)?

I needed to massage the type conversions because int("1.0") is invalid but int(float("1.0")) is valid.

lib/model.py

Add change detection benchmarker prototype

4b3c922

shreyashankar reviewed Apr 19, 2022

View reviewed changes

lib/data_clean_utils.py Outdated Show resolved Hide resolved

shreyashankar reviewed Apr 20, 2022

View reviewed changes

model.ipynb Show resolved Hide resolved

model.ipynb Show resolved Hide resolved

model.ipynb Show resolved Hide resolved

Add interpretability features + all models fully functional

9674a6d

1. Add f1, accuracy scores to CSV 2. Aggregate CSVs 3. Track runtimes for each method 4. All methods now ready for use

Remove tip_amount from train labels

cf0d7b5

`tip_amount` leaks information to `tip_percent_greater_15`

shreyashankar reviewed Apr 27, 2022

View reviewed changes

lib/model.py Outdated Show resolved Hide resolved

Replace registration decorators with direct inheritance

d38724f

yuji3w changed the title ~~[WIP] Add Drift Detector Benchmarker Prototype~~ Add Drift Detector Benchmarker Prototype Apr 28, 2022

yuji3w marked this pull request as draft April 28, 2022 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Drift Detector Benchmarker Prototype#17

Add Drift Detector Benchmarker Prototype#17
yuji3w wants to merge 4 commits intologlabs:mainfrom
yuji3w:main

yuji3w commented Apr 19, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 25, 2022

Uh oh!

shreyashankar Apr 27, 2022

Uh oh!

yuji3w Apr 28, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuji3w commented Apr 19, 2022

Key features:

Created benchmarking framework, allowing easy testing of the following drift detectors created in this pr:

Created graphing utility, allowing for plotting inter-week drift on a feature-wise level

Created accessor class and factory, allows for easy access to data frame data.

Created batch loader for loading large date ranges (especially greater than 1 year).

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 25, 2022

Uh oh!

shreyashankar Apr 27, 2022

Choose a reason for hiding this comment

Uh oh!

yuji3w Apr 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants