-
Notifications
You must be signed in to change notification settings - Fork 27
Description
The code in function extract_sentence_representations line 119 neurox/data/extraction/transformers_extractor.py
Assertion in line 243
Loading model: asafaya/albert-base-arabic
Reading input corpus
Reading filter vocabulary
Preparing output file: ...join_3.test.txt.asafaya-albert-base-arabic.hdf5
Extracting representations from model
Original (015): ['الله', 'a يرزقني', 'a بالشخص', 'a اللي', 'a اذا', 'a مت', 'a يدعي', 'a لي', 'a بكل', 'a صلاه', 'a .', 'a .',
'a ?', 'a ', 'a ?']
Tokenized (024): ['[CLS]', '▁الله', '▁يرزق', 'ني', '▁بال', 'شخص', '▁اللي', '▁اذا', '▁مت', '▁يدعي', '▁لي', '▁بكل', '▁صل', 'ا', '
ه', '▁', '.', '▁', '.', '▁', '?', '▁', '?', '[SEP]']
['▁الله', '▁يرزقني', '▁بالشخص', '▁اللي', '▁اذا', '▁مت', '▁يدعي', '▁لي', '▁بكل', '▁صلاه', '▁.', '▁.', '▁?', '▁', '?']
[24, 15182, 150, 64, 5746, 2299, 998, 764, 7825, 154, 610, 3793, 16, 15, 11, 9, 11, 9, 11, 5158, 11, 5158]
Res: counter: 23 ids_without_special_tokens: 22