Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| @@ -0,0 +1,478 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #10. hash_vocab('bert-base-cased-vocab.txt', 'voc_hash.txt')
Todo Figure out if we can get a download link for this.
Reply via ReviewNB
| @@ -0,0 +1,478 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #16. df = getDF('/nvme/1/ssayyah/nv-wip/amazon_bookreview.json.gz')
Todo Figure out if we can get a download link for this.
Reply via ReviewNB
| @@ -0,0 +1,478 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #5. bert = AutoModel.from_pretrained('bert-base-uncased')
Replace with this model :
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
Reply via ReviewNB
| @@ -0,0 +1,478 @@ | |||
| { | |||
There was a problem hiding this comment.
Can we do it to a fixed length to keep the example minimal , so something like , maybe something like 256 ?
Reply via ReviewNB
| @@ -0,0 +1,478 @@ | |||
| { | |||
There was a problem hiding this comment.
| @@ -0,0 +1,478 @@ | |||
| { | |||
There was a problem hiding this comment.
Line #1. train_seq = torch.tensor(tokens_train['input_ids'])
This goes cuda array interface rather than dlpack . Are there performance implications of either ?
Reply via ReviewNB
First PR draft of an example of using cuML BERT tokenizer and model for sentiment classification on Amazon book review dataset.