-
Notifications
You must be signed in to change notification settings - Fork 69
[WIP] 21.10 Notebook Testing Report #457
Description
Describe the bug
Discovered errors in a few notebooks relating to the CLX library after running them in the 21.10 stable release. The errors were tested and found in both CentOS and Ubuntu operating systems. Not sure if these errors are a result of possible updates to the codebase or if it was an uncaught bug.
Steps/Code to reproduce bug
Steps to reproduce the behavior:
1.Go to RAPIDS Sample Notebooks and clone the 21.10 branch
2.Click on the CLX folder
3.Run all the cells of the notebooks to produce the examples illustrated below
Expected behavior
There will be several examples that will create an error. Many examples miss details that could aide in implementation. The code may be a few commits behind from the 21.10 repo.
Environment details (please complete the following information):
- Environment location: Docker
- Linux Distro/Architecture: Ubuntu 20.04 amd64 and CentOS 8
- GPU Model/Driver: GV100, 450.142.00
- CUDA: 11.0
- Method of Library install: Docker Install
Additional context
Examples of Discrepancies:
Example # 1
CLX_Workflow_Notebook2
workflow = SplunkAlertWorkflow(name="my-splunk-alert-workflow", source=source, destination=dest)
workflow.run_workflow()
Error thrown below:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-637a4e2a27ab> in <module>
1 workflow = SplunkAlertWorkflow(name="my-splunk-alert-workflow", source=source, destination=dest)
----> 2 workflow.run_workflow()
/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/workflow/workflow.py in run_workflow(self)
179 self._io_reader.fetch_data()
180 )
--> 181 enriched_dataframe = self.workflow(dataframe)
182 if enriched_dataframe and not enriched_dataframe.empty:
183 self._io_writer.write_data(enriched_dataframe)
<ipython-input-2-e6dcdb279a63> in workflow(self, dataframe)
8 # We use a splunk notable parser to parse data raw Splunk notable data.
9 snp = SplunkNotableParser()
---> 10 parsed_df = snp.parse(dataframe, raw_data_col_name)
11
12 # Create alerts dataframe
/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/parsers/splunk_notable_parser.py in parse(self, dataframe, raw_column)
48 """
49 # Cleaning raw data to be consistent.
---> 50 dataframe[raw_column] = dataframe[raw_column].str.replace("\\\\", "")
51 parsed_dataframe = self.parse_raw_event(dataframe, raw_column, self.event_regex)
52 # Replace null values of all columns with empty.
/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
72 def inner(*args, **kwds):
73 with self._recreate_cm():
---> 74 return func(*args, **kwds)
75 return inner
76
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py in __getitem__(self, arg)
681 """
682 if _is_scalar_or_zero_d_array(arg) or isinstance(arg, tuple):
--> 683 return self._get_columns_by_label(arg, downcast=True)
684
685 elif isinstance(arg, slice):
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py in _get_columns_by_label(self, labels, downcast)
1574 If downcast is True, try and downcast from a DataFrame to a Series
1575 """
-> 1576 new_data = super()._get_columns_by_label(labels, downcast)
1577 if downcast:
1578 if is_scalar(labels):
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/frame.py in _get_columns_by_label(self, labels, downcast)
524
525 """
--> 526 return self._data.select_by_label(labels)
527
528 def _get_columns_by_index(self, indices):
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column_accessor.py in select_by_label(self, key)
344 if any(isinstance(k, slice) for k in key):
345 return self._select_by_label_with_wildcard(key)
--> 346 return self._select_by_label_grouped(key)
347
348 def select_by_index(self, index: Any) -> ColumnAccessor:
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column_accessor.py in _select_by_label_grouped(self, key)
406
407 def _select_by_label_grouped(self, key: Any) -> ColumnAccessor:
--> 408 result = self._grouped_data[key]
409 if isinstance(result, cudf.core.column.ColumnBase):
410 return self.__class__({key: result})
KeyError: 'Raw'
Where I am not able to run a workflow as might be expected without a testing source data to determine success
A second instance of the same issue
CLX_Workflow_Notebook3
workflow = SplunkAlertWorkflow(name="splunk_workflow", source=source, destination=dest,
threshold=2.0, raw_data_col_name="Raw")
workflow.run_workflow()
Where I am not able to run a workflow as might be expected without a testing source data to determine success
Identical Error thrown below:
---------------------------------------------------------------------------
KeyError: 'Raw'
Example # 2
anomalous_behavior_profiling_supervised
import xgboost as xgb
import cudf
#from cuml.preprocessing import train_test_split
from cuml.preprocessing.model_selection import train_test_split
from cuml import ForestInference
import sklearn.datasets
import cupy
df = cudf.read_json("./labelled_nv_smi.json")
Where
from cuml.preprocessing import train_test_split
Seems to be outdated and isn't recognized - I've changed to:
from cuml.preprocessing.model_selection import train_test_split
Which revealed a second error where ./labelled_nv_smi.json cannot be not found" I am not able to run notebook as might be expected without a testing json file or correct filepath to determine success
Error thrown below:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-8680e3b7e66c> in <module>
----> 1 df = cudf.read_json("./labelled_nv_smi.json")
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/json.py in read_json(path_or_buf, engine, dtype, lines, compression, byte_range, *args, **kwargs)
95 compression=compression,
96 *args,
---> 97 **kwargs,
98 )
99 df = cudf.from_pandas(pd_value)
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
205 else:
206 kwargs[new_arg_name] = new_arg_value
--> 207 return func(*args, **kwargs)
208
209 return cast(F, wrapper)
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
612
613 with json_reader:
--> 614 return json_reader.read()
615
616
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in read(self)
746 obj = self._get_object_parser(self._combine_lines(data_lines))
747 else:
--> 748 obj = self._get_object_parser(self.data)
749 self.close()
750 return obj
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
768 obj = None
769 if typ == "frame":
--> 770 obj = FrameParser(json, **kwargs).parse()
771
772 if typ == "series" or obj is None:
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in parse(self)
883
884 else:
--> 885 self._parse_no_numpy()
886
887 if self.obj is None:
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1138 if orient == "columns":
1139 self.obj = DataFrame(
-> 1140 loads(json, precise_float=self.precise_float), dtype=None
1141 )
1142 elif orient == "split":
ValueError: Expected object or value
A second instance of the same issue
Predictive_Maintenance_Sequence_Classifier
import cudf;
from cuml.model_selection._split import train_test_split
#from cuml.preprocessing.model_selection import train_test_split;
from clx.analytics.binary_sequence_classifier import BinarySequenceClassifier;
import s3fs;
from os import path;
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
dflogs = cudf.read_csv("kernel.tsv", delimiter='\t', header=None, names=['label', 'log'])
Where
from cuml.preprocessing import train_test_split
Seems to be outdated and isn't recognized - I've changed to:
from cuml.preprocessing.model_selection import train_test_split
Which revealed a second error where kernel.tsv cannot be not found" I am not able to run notebook as might be expected without a testing tsv file or correct filepath to determine success.
Identical Error thrown below:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-2-556afabf946c> in <module>
----> 1 dflogs = cudf.read_csv("kernel.tsv", delimiter='\t', header=None, names=['label', 'log'])
/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
72 def inner(*args, **kwds):
73 with self._recreate_cm():
---> 74 return func(*args, **kwds)
75 return inner
76
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
108 na_filter=na_filter,
109 prefix=prefix,
--> 110 index_col=index_col,
111 )
112
cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()
FileNotFoundError: [Errno 2] No such file or directory: 'kernel.tsv'
Example # 3
cybert_example_training
padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
label_tensor = torch.tensor(int_labels).to('cuda')
Where I am not able to test unless Torch is compiled with CUDA enabled
Error thrown below:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-14-a9620f79a890> in <module>
1 padded_labels = [pad(x[:256], '[PAD]', 256) for x in subword_labels]
2 int_labels = [[label2id.get(l) for l in lab] for lab in padded_labels]
----> 3 label_tensor = torch.tensor(int_labels).to('cuda')
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
164 "Cannot re-initialize CUDA in forked subprocess. " + msg)
165 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166 raise AssertionError("Torch not compiled with CUDA enabled")
167 if _cudart is None:
168 raise AssertionError(
AssertionError: Torch not compiled with CUDA enabled
After discovering the above I tried to remedy by reinstalling pytorch with CUDA enabled using:
from os import path
import s3fs
try:
import pytorch; print('pytorch Version:', pytorch.__version__)
except ModuleNotFoundError:
!conda install pytorch torchvision torchaudio cudatoolkit=11.4 -c pytorch -c nvidia -y
import pytorch; print('pytorch Version:', pytorch.__version__)
#conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.dataset import random_split
from torch.utils.dlpack import from_dlpack
try:
import seqeval; #print('seqeval Version:', seqeval.__version__)
except ModuleNotFoundError:
!conda install -c conda-forge seqeval -y
import seqeval; #print('seqeval Version:', seqeval.__version__)
from seqeval.metrics import classification_report,accuracy_score,f1_score
from transformers import BertForTokenClassification
from tqdm import tqdm,trange
from collections import defaultdict
import pandas as pd
import numpy as np
import cupy
import cudf
Which failed.
A second instance of the same issue
cybert_log_parsing
cybert = Cybert()
cybert.load_model(MODEL_FILENAME, CONFIG_FILENAME)
After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.
Identical Error thrown below:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-6-92030a3ca008> in <module>
1 cybert = Cybert()
----> 2 cybert.load_model(MODEL_FILENAME, CONFIG_FILENAME)
/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/cybert.py in load_model(self, model_filepath, config_filepath)
89 model_filepath, config=config_filepath,
90 )
---> 91 self._model.cuda()
92 self._model.eval()
93 self._model = nn.DataParallel(self._model)
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in cuda(self, device)
461 Module: self
462 """
--> 463 return self._apply(lambda t: t.cuda(device))
464
465 def cpu(self: T) -> T:
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
357 def _apply(self, fn):
358 for module in self.children():
--> 359 module._apply(fn)
360
361 def compute_should_use_set_data(tensor, tensor_applied):
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
357 def _apply(self, fn):
358 for module in self.children():
--> 359 module._apply(fn)
360
361 def compute_should_use_set_data(tensor, tensor_applied):
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
357 def _apply(self, fn):
358 for module in self.children():
--> 359 module._apply(fn)
360
361 def compute_should_use_set_data(tensor, tensor_applied):
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in _apply(self, fn)
379 # `with torch.no_grad():`
380 with torch.no_grad():
--> 381 param_applied = fn(param)
382 should_use_set_data = compute_should_use_set_data(param, param_applied)
383 if should_use_set_data:
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/nn/modules/module.py in <lambda>(t)
461 Module: self
462 """
--> 463 return self._apply(lambda t: t.cuda(device))
464
465 def cpu(self: T) -> T:
/opt/conda/envs/rapids/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init()
164 "Cannot re-initialize CUDA in forked subprocess. " + msg)
165 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166 raise AssertionError("Torch not compiled with CUDA enabled")
167 if _cudart is None:
168 raise AssertionError(
AssertionError: Torch not compiled with CUDA enabled
A third instance of the same issue
CLX_Supervised_Asset_Classification
cat_cols.remove("label")
ac.train_model(X_train, cat_cols, cont_cols, "label", batch_size, epochs, lr=0.01, wd=0.0)
After discovering the below error I tried to remedy by reinstalling pytorch with CUDA enabled using the same installation process.
Identical Error thrown below:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-3cbe42caeeeb> in <module>
1 cat_cols.remove("label")
----> 2 ac.train_model(X_train, cat_cols, cont_cols, "label", batch_size, epochs, lr=0.01, wd=0.0)
/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/analytics/asset_classification.py in train_model(self, train_gdf, cat_cols, cont_cols, label_col, batch_size, epochs, lr, wd)
96
97 self._model = TabularModel(embedding_sizes, n_cont, out_sz, self._layers, self._drops, self._emb_drop, self._is_reg, self._is_multi, self._use_bn)
---> 98 self._to_device(self._model, self._device)
99 self._config_optimizer()
100 for i in range(epochs):
/opt/conda/envs/rapids/lib/python3.7/site-packages/clx/analytics/asset_classification.py in _to_device(self, data, device)
264 if isinstance(data, (list, tuple)):
265 return [self._to_device(x, device) for x in data]
--> 266 return data.to(device, non_blocking=True)
/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
610 return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
611
--> 612 return self._apply(convert)
613
614 def register_backward_hook(
/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
357 def _apply(self, fn):
358 for module in self.children():
--> 359 module._apply(fn)
360
361 def compute_should_use_set_data(tensor, tensor_applied):
/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
357 def _apply(self, fn):
358 for module in self.children():
--> 359 module._apply(fn)
360
361 def compute_should_use_set_data(tensor, tensor_applied):
/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
379 # `with torch.no_grad():`
380 with torch.no_grad():
--> 381 param_applied = fn(param)
382 should_use_set_data = compute_should_use_set_data(param, param_applied)
383 if should_use_set_data:
/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/nn/modules/module.py in convert(t)
608 if convert_to_format is not None and t.dim() == 4:
609 return t.to(device, dtype if t.is_floating_point() else None, non_blocking, memory_format=convert_to_format)
--> 610 return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
611
612 return self._apply(convert)
/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
164 "Cannot re-initialize CUDA in forked subprocess. " + msg)
165 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 166 raise AssertionError("Torch not compiled with CUDA enabled")
167 if _cudart is None:
168 raise AssertionError(
AssertionError: Torch not compiled with CUDA enabled
Example # 4
DGA_Detection
%%time
dd.train_model(train_data, labels, batch_size=BATCH_SIZE, epochs=EPOCHS, train_size=0.7)
Where I received a memory error.
Error thrown below:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<timed eval> in <module>
/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in train_model(self, train_data, labels, batch_size, epochs, train_size, truncate)
110 types_tensor = self._create_types_tensor(df["type"])
111 df = df.drop(["type", "domain"], axis=1)
--> 112 input, seq_lengths = self._create_variables(df)
113 model_result = self.model(input, seq_lengths)
114 loss = self._get_loss(model_result, types_tensor)
/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _create_variables(self, df)
182 df = df.drop("len", axis=1)
183 seq_len_tensor = torch.LongTensor(seq_len_arr)
--> 184 seq_tensor = self._df2tensor(df)
185 # Return variables
186 # DataParallel requires everything to be a Variable
/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _df2tensor(self, ascii_df)
195 """
196 dlpack_ascii_tensor = ascii_df.to_dlpack()
--> 197 seq_tensor = from_dlpack(dlpack_ascii_tensor).long()
198 return seq_tensor
199
RuntimeError: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend. 'aten::empty.memory_format' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
CPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
MkldnnCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/MkldnnCPUType.cpp:144 [kernel]
SparseCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/SparseCPUType.cpp:239 [kernel]
BackendSelect: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/BackendSelectRegister.cpp:761 [kernel]
Named: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCPU: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCUDA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradXLA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse1: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse2: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse3: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
Tracer: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/TraceType_4.cpp:9291 [kernel]
Autocast: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
A second instance of the similar issue
custream_n_graph
output = source.map(process_batch).map(pagerank).sink_to_list()
Which causes the Kernel to restart.
A third instance of the similar issue
Phishing_Detection_using_Bert_CLX
seq_classifier.train_model(X_train["email"], y_train, epochs=1)
Which causes the Kernel to restart.
A fourth instance of the similar issue
pii_detection_training_example
pii_detection_training_example
Which causes the Kernel to restart.
Example # 5
LODA_anomaly_detection
import cupy as cp
import cudf, cuml
import matplotlib.pylab as plt
import cuml.metrics as mt
try:
import wget; print('wget Version:', wget.__version__)
except ModuleNotFoundError:
!conda install -c conda-forge wget -y
import wget; print('wget Version:', wget.__version__)
#import wget
import s3fs;
from os import path;
%matplotlib inline
from clx.analytics.loda import Loda
Where wget is not recognized
Error thrown below:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-444183b1dcb0> in <module>
5 try:
----> 6 import wget; print('wget Version:', wget.__version__)
7 except ModuleNotFoundError:
ModuleNotFoundError: No module named 'wget'
During handling of the above exception, another exception occurred:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-444183b1dcb0> in <module>
7 except ModuleNotFoundError:
8 get_ipython().system('conda install -c rapidsai wget -y')
----> 9 import wget; print('wget Version:', wget.__version__)
10 #import wget
11 import s3fs;
ModuleNotFoundError: No module named 'wget'
Example # 6
FLAIR_DNS_Log_Parsing
data1 = cudf.read_csv('query_output1545120200000_1545163200000.tab', sep='\t',nrows=500000, quoting=3)
Where query_output1545120200000_1545163200000.tab' is not available
Error thrown below:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-2-a47cb0939166> in <module>
----> 1 data1 = cudf.read_csv('query_output1545120200000_1545163200000.tab', sep='\t',nrows=500000, quoting=3)
/opt/conda/envs/rapids/lib/python3.7/contextlib.py in inner(*args, **kwds)
72 def inner(*args, **kwds):
73 with self._recreate_cm():
---> 74 return func(*args, **kwds)
75 return inner
76
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
108 na_filter=na_filter,
109 prefix=prefix,
--> 110 index_col=index_col,
111 )
112
cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()
FileNotFoundError: [Errno 2] No such file or directory: 'query_output1545120200000_1545163200000.tab'
A second instance of the same issue
IDS_using_LODA
dir_path = "put/path/extracted/cic_ids2017/"
datasets = os.listdir(dir_path)
Error thrown below:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-2-2268e14fff44> in <module>
1 dir_path = "put/path/extracted/cic_ids2017/"
----> 2 datasets = os.listdir(dir_path)
FileNotFoundError: [Errno 2] No such file or directory: 'put/path/extracted/cic_ids2017/'
Desired outcome
Clx notebooks should be immediately ready to be replicated and implemented with less effort. Notebooks should be updated to reflect the commits made to the repositories during each release cycle. Clx functions and models work as expected.
Request impacts
Our Clx notebooks are public and require accurate information - Medium Priority
@taureandyernv @fondaing @efajardo-nv @bsuryadevara for awareness
Metadata
Metadata
Assignees
Labels
Type
Projects
Status