[WIP] 21.10 Notebook Testing Report

**Describe the bug**
Discovered errors in a few notebooks relating to the CLX library after running them in the 21.10 stable release. The errors were tested and found in both CentOS and Ubuntu operating systems. Not sure if these errors are a result of possible updates to the codebase or if it was an uncaught bug.

**Steps/Code to reproduce bug**
Steps to reproduce the behavior:
1.Go to [RAPIDS Sample Notebooks](https://github.com/rapidsai/notebooks/tree/branch-21.10) and clone the 21.10 branch
2.Click on the [CLX folder](https://github.com/rapidsai/clx/tree/branch-21.12/notebooks)
3.Run all the cells of the notebooks to produce the examples illustrated below

**Expected behavior**
There will be several examples that will create an error. Many examples miss details that could aide in implementation. The code may be a few commits behind from the 21.10 repo.

**Environment details (please complete the following information):**
 - Environment location: Docker
 - Linux Distro/Architecture: Ubuntu 20.04 amd64 and CentOS 8
 - GPU Model/Driver: GV100, 450.142.00
 - CUDA: 11.0
 - Method of Library install: Docker Install
 
**Additional context**
Examples of Discrepancies:

Example # 1
CLX_Workflow_Notebook2

```
workflow = SplunkAlertWorkflow(name="my-splunk-alert-workflow", source=source, destination=dest)
workflow.run_workflow()
```

Error thrown below:
```
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-637a4e2a27ab> in <module>
      1 workflow = SplunkAlertWorkflow(name="my-splunk-alert-workflow", source=source, destination=dest)
----> 2 workflow.run_workflow()

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/workflow/workflow.py in run_workflow(self)
    179                     self._io_reader.fetch_data()
    180                 )
--> 181                 enriched_dataframe = self.workflow(dataframe)
    182                 if enriched_dataframe and not enriched_dataframe.empty:
    183                     self._io_writer.write_data(enriched_dataframe)

<ipython-input-2-e6dcdb279a63> in workflow(self, dataframe)
      8         # We use a splunk notable parser to parse data raw Splunk notable data.
      9         snp = SplunkNotableParser()
---> 10         parsed_df = snp.parse(dataframe, raw_data_col_name)
     11 
     12         # Create alerts dataframe

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/parsers/splunk_notable_parser.py in parse(self, dataframe, raw_column)
     48         """
     49         # Cleaning raw data to be consistent.
---> 50         dataframe[raw_column] = dataframe[raw_column].str.replace("\\\\", "")
     51         parsed_dataframe = self.parse_raw_event(dataframe, raw_column, self.event_regex)
     52         # Replace null values of all columns with empty.

/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py in __getitem__(self, arg)
    681         """
    682         if _is_scalar_or_zero_d_array(arg) or isinstance(arg, tuple):
--> 683             return self._get_columns_by_label(arg, downcast=True)
    684 
    685         elif isinstance(arg, slice):

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py in _get_columns_by_label(self, labels, downcast)
   1574         If downcast is True, try and downcast from a DataFrame to a Series
   1575         """
-> 1576         new_data = super()._get_columns_by_label(labels, downcast)
   1577         if downcast:
   1578             if is_scalar(labels):

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/frame.py in _get_columns_by_label(self, labels, downcast)
    524 
    525         """
--> 526         return self._data.select_by_label(labels)
    527 
    528     def _get_columns_by_index(self, indices):

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column_accessor.py in select_by_label(self, key)
    344                 if any(isinstance(k, slice) for k in key):
    345                     return self._select_by_label_with_wildcard(key)
--> 346             return self._select_by_label_grouped(key)
    347 
    348     def select_by_index(self, index: Any) -> ColumnAccessor:

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column_accessor.py in _select_by_label_grouped(self, key)
    406 
    407     def _select_by_label_grouped(self, key: Any) -> ColumnAccessor:
--> 408         result = self._grouped_data[key]
    409         if isinstance(result, cudf.core.column.ColumnBase):
    410             return self.__class__({key: result})

KeyError: 'Raw'
```
Where I am not able to run a workflow as might be expected without a testing source data to determine success 

A second instance of the same issue
CLX_Workflow_Notebook3
```
workflow = SplunkAlertWorkflow(name="splunk_workflow", source=source, destination=dest,
                               threshold=2.0, raw_data_col_name="Raw")
workflow.run_workflow()
```
Where I am not able to run a workflow as might be expected without a testing source data  to determine success 

Identical Error thrown below:
```
---------------------------------------------------------------------------
KeyError: 'Raw'
```

Example # 2
anomalous_behavior_profiling_supervised

```
import xgboost as xgb
import cudf
#from cuml.preprocessing import train_test_split
from cuml.preprocessing.model_selection import train_test_split
from cuml import ForestInference
import sklearn.datasets
import cupy

df = cudf.read_json("./labelled_nv_smi.json")
```
Where
``` 
from cuml.preprocessing import train_test_split
```
Seems to be outdated and isn't recognized - I've changed to:
```
from cuml.preprocessing.model_selection import train_test_split
```
Which revealed a second error where ./labelled_nv_smi.json cannot be  not found" I am not able to run notebook as might be expected without a testing json file or correct filepath to determine success 
 
Error thrown below:
```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-8680e3b7e66c> in <module>
----> 1 df = cudf.read_json("./labelled_nv_smi.json")

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/json.py in read_json(path_or_buf, engine, dtype, lines, compression, byte_range, *args, **kwargs)
     95                 compression=compression,
     96                 *args,
---> 97                 **kwargs,
     98             )
     99         df = cudf.from_pandas(pd_value)

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    205                 else:
    206                     kwargs[new_arg_name] = new_arg_value
--> 207             return func(*args, **kwargs)
    208 
    209         return cast(F, wrapper)

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
    612 
    613     with json_reader:
--> 614         return json_reader.read()
    615 
    616 

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read(self)
    746                 obj = self._get_object_parser(self._combine_lines(data_lines))
    747         else:
--> 748             obj = self._get_object_parser(self.data)
    749         self.close()
    750         return obj

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    768         obj = None
    769         if typ == "frame":
--> 770             obj = FrameParser(json, **kwargs).parse()
    771 
    772         if typ == "series" or obj is None:

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in parse(self)
    883 
    884         else:
--> 885             self._parse_no_numpy()
    886 
    887         if self.obj is None:

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1138         if orient == "columns":
   1139             self.obj = DataFrame(
-> 1140                 loads(json, precise_float=self.precise_float), dtype=None
   1141             )
   1142         elif orient == "split":

ValueError: Expected object or value
```

A second instance of the same issue
Predictive_Maintenance_Sequence_Classifier
```
import cudf;
from cuml.model_selection._split import train_test_split
#from cuml.preprocessing.model_selection import train_test_split;
from clx.analytics.binary_sequence_classifier import BinarySequenceClassifier;
import s3fs;
from os import path;
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score

dflogs = cudf.read_csv("kernel.tsv", delimiter='\t', header=None, names=['label', 'log'])
```
Where
``` 
from cuml.preprocessing import train_test_split
```
Seems to be outdated and isn't recognized - I've changed to:
```
from cuml.preprocessing.model_selection import train_test_split
```
Which revealed a second error where kernel.tsv cannot be  not found" I am not able to run notebook as might be expected without a testing tsv file or correct filepath to determine success. 

Identical Error thrown below:
```
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-556afabf946c> in <module>
----> 1 dflogs = cudf.read_csv("kernel.tsv", delimiter='\t', header=None, names=['label', 'log'])

/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
    108         na_filter=na_filter,
    109         prefix=prefix,
--> 110         index_col=index_col,
    111     )
    112 

cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()

FileNotFoundError: [Errno 2] No such file or directory: 'kernel.tsv'
```

Example # 3
cybert_example_training
```
padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
label_tensor = torch.tensor(int_labels).to('cuda')
```

Where I am not able to test unless Torch is compiled with CUDA enabled 

Error thrown below:
```
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-14-a9620f79a890> in <module>
      1 padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
      2 int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
----> 3 label_tensor = torch.tensor(int_labels).to('cuda')

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
    164                 "Cannot re-initialize CUDA in forked subprocess. " + msg)
    165         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166             raise AssertionError("Torch not compiled with CUDA enabled")
    167         if _cudart is None:
    168             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled
```

After discovering the above I tried to remedy by reinstalling pytorch with CUDA enabled using:

```
from os import path
import s3fs

try:
        import pytorch; print('pytorch Version:', pytorch.__version__)  
except ModuleNotFoundError:
        !conda install pytorch torchvision torchaudio cudatoolkit=11.4 -c pytorch -c nvidia -y
        import pytorch; print('pytorch Version:', pytorch.__version__)
        
#conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.dataset import random_split
from torch.utils.dlpack import from_dlpack


try:
        import seqeval; #print('seqeval Version:', seqeval.__version__)  
except ModuleNotFoundError:
        !conda install -c conda-forge seqeval -y
        import seqeval; #print('seqeval Version:', seqeval.__version__)

from seqeval.metrics import classification_report,accuracy_score,f1_score
from transformers import BertForTokenClassification
from tqdm import tqdm,trange
from collections import defaultdict
import pandas as pd
import numpy as np
import cupy
import cudf
 ```

Which failed.

A second instance of the same issue
cybert_log_parsing

```
cybert = Cybert()
cybert.load_model(MODEL_FILENAME, CONFIG_FILENAME)
```

After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.

Identical Error thrown below:
```
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-92030a3ca008> in <module>
      1 cybert = Cybert()
----> 2 cybert.load_model(MODEL_FILENAME, CONFIG_FILENAME)

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/cybert.py in load_model(self, model_filepath, config_filepath)
     89             model_filepath, config=config_filepath,
     90         )
---> 91         self._model.cuda()
     92         self._model.eval()
     93         self._model = nn.DataParallel(self._model)

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in cuda(self, device)
    461             Module: self
    462         """
--> 463         return self._apply(lambda t: t.cuda(device))
    464 
    465     def cpu(self: T) -> T:

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    379                 # `with torch.no_grad():`
    380                 with torch.no_grad():
--> 381                     param_applied = fn(param)
    382                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    383                 if should_use_set_data:

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in <lambda>(t)
    461             Module: self
    462         """
--> 463         return self._apply(lambda t: t.cuda(device))
    464 
    465     def cpu(self: T) -> T:

/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
    164                 "Cannot re-initialize CUDA in forked subprocess. " + msg)
    165         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166             raise AssertionError("Torch not compiled with CUDA enabled")
    167         if _cudart is None:
    168             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled
```

A third instance of the same issue
CLX_Supervised_Asset_Classification
```
cat_cols.remove("label")
ac.train_model(X_train, cat_cols, cont_cols, "label", batch_size, epochs, lr=0.01, wd=0.0)
```
After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.

Identical Error thrown below:
```
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-17-3cbe42caeeeb> in <module>
      1 cat_cols.remove("label")
----> 2 ac.train_model(X_train, cat_cols, cont_cols, "label", batch_size, epochs, lr=0.01, wd=0.0)

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/analytics/asset_classification.py in train_model(self, train_gdf, cat_cols, cont_cols, label_col, batch_size, epochs, lr, wd)
     96 
     97         self._model = TabularModel(embedding_sizes, n_cont, out_sz, self._layers, self._drops, self._emb_drop, self._is_reg, self._is_multi, self._use_bn)
---> 98         self._to_device(self._model, self._device)
     99         self._config_optimizer()
    100         for i in range(epochs):

/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/analytics/asset_classification.py in _to_device(self, data, device)
    264         if isinstance(data, (list, tuple)):
    265             return [self._to_device(x, device) for x in data]
--> 266         return data.to(device, non_blocking=True)

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
    610             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    611 
--> 612         return self._apply(convert)
    613 
    614     def register_backward_hook(

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    357     def _apply(self, fn):
    358         for module in self.children():
--> 359             module._apply(fn)
    360 
    361         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    379                 # `with torch.no_grad():`
    380                 with torch.no_grad():
--> 381                     param_applied = fn(param)
    382                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    383                 if should_use_set_data:

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in convert(t)
    608             if convert_to_format is not None and t.dim() == 4:
    609                 return t.to(device, dtype if t.is_floating_point() else None, non_blocking, memory_format=convert_to_format)
--> 610             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    611 
    612         return self._apply(convert)

/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
    164                 "Cannot re-initialize CUDA in forked subprocess. " + msg)
    165         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166             raise AssertionError("Torch not compiled with CUDA enabled")
    167         if _cudart is None:
    168             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled
```

Example # 4
DGA_Detection

```
%%time
dd.train_model(train_data, labels, batch_size=BATCH_SIZE, epochs=EPOCHS, train_size=0.7)
```

Where I received a memory error. 

Error thrown below:
```
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<timed eval> in <module>

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in train_model(self, train_data, labels, batch_size, epochs, train_size, truncate)
    110                     types_tensor = self._create_types_tensor(df["type"])
    111                     df = df.drop(["type", "domain"], axis=1)
--> 112                     input, seq_lengths = self._create_variables(df)
    113                     model_result = self.model(input, seq_lengths)
    114                     loss = self._get_loss(model_result, types_tensor)

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _create_variables(self, df)
    182         df = df.drop("len", axis=1)
    183         seq_len_tensor = torch.LongTensor(seq_len_arr)
--> 184         seq_tensor = self._df2tensor(df)
    185         # Return variables
    186         # DataParallel requires everything to be a Variable

/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _df2tensor(self, ascii_df)
    195         """
    196         dlpack_ascii_tensor = ascii_df.to_dlpack()
--> 197         seq_tensor = from_dlpack(dlpack_ascii_tensor).long()
    198         return seq_tensor
    199 

RuntimeError: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend. 'aten::empty.memory_format' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
MkldnnCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/MkldnnCPUType.cpp:144 [kernel]
SparseCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/SparseCPUType.cpp:239 [kernel]
BackendSelect: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/BackendSelectRegister.cpp:761 [kernel]
Named: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCPU: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCUDA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradXLA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse1: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse2: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse3: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
Tracer: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/TraceType_4.cpp:9291 [kernel]
Autocast: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
```

A second instance of the similar issue
custream_n_graph

```
output = source.map(process_batch).map(pagerank).sink_to_list()
```

Which causes the Kernel to restart.


A third instance of the similar issue
Phishing_Detection_using_Bert_CLX
```
seq_classifier.train_model(X_train["email"], y_train, epochs=1)
```

Which causes the Kernel to restart.

A fourth instance of the similar issue
pii_detection_training_example

```
pii_detection_training_example
```

Which causes the Kernel to restart.

Example # 5 
LODA_anomaly_detection

```
import cupy as cp 
import cudf, cuml 
import matplotlib.pylab as plt 
import cuml.metrics as mt
try:
        import wget; print('wget Version:', wget.__version__)  
except ModuleNotFoundError:
        !conda install -c conda-forge wget -y
        import wget; print('wget Version:', wget.__version__)
#import wget
import s3fs;
from os import path;
%matplotlib inline 

from clx.analytics.loda import Loda
```

Where wget is not recognized
   
Error thrown below:
```
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-444183b1dcb0> in <module>
      5 try:
----> 6         import wget; print('wget Version:', wget.__version__)
      7 except ModuleNotFoundError:

ModuleNotFoundError: No module named 'wget'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-444183b1dcb0> in <module>
      7 except ModuleNotFoundError:
      8         get_ipython().system('conda install -c rapidsai wget -y')
----> 9         import wget; print('wget Version:', wget.__version__)
     10 #import wget
     11 import s3fs;

ModuleNotFoundError: No module named 'wget'
```

Example # 6 
FLAIR_DNS_Log_Parsing

```
data1 = cudf.read_csv('query_output1545120200000_1545163200000.tab', sep='\t',nrows=500000, quoting=3)
```

Where query_output1545120200000_1545163200000.tab' is not available
   
Error thrown below:
```
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-a47cb0939166> in <module>
----> 1 data1 = cudf.read_csv('query_output1545120200000_1545163200000.tab', sep='\t',nrows=500000, quoting=3)

/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
    108         na_filter=na_filter,
    109         prefix=prefix,
--> 110         index_col=index_col,
    111     )
    112 

cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()

FileNotFoundError: [Errno 2] No such file or directory: 'query_output1545120200000_1545163200000.tab'
```

A second instance of the same issue
IDS_using_LODA

```
dir_path = "put/path/extracted/cic_ids2017/"
datasets = os.listdir(dir_path)
```

Error thrown below:
```
---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-2268e14fff44> in <module>
      1 dir_path = "put/path/extracted/cic_ids2017/"
----> 2 datasets = os.listdir(dir_path)

FileNotFoundError: [Errno 2] No such file or directory: 'put/path/extracted/cic_ids2017/'
```

**Desired outcome**
Clx notebooks should be immediately ready to be replicated and implemented with less effort.  Notebooks should be updated to reflect the commits made to the repositories during each release cycle. Clx functions and models work as expected.

**Request impacts**
Our Clx notebooks are public and require accurate information - Medium Priority

 @taureandyernv @fondaing  @efajardo-nv @bsuryadevara for awareness


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] 21.10 Notebook Testing Report #457

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[WIP] 21.10 Notebook Testing Report #457

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions