The work is based on Udacity Data Science Nano-Degree on the section regarding NLP. The work present the codes for the following items:
- Text Normalization
- Case Normalization
- Punctuation Removal
- Parts of Speech (POS) Tagging
- Stemming and Lemmatizing
- stop words
- Vectorizing text
- CountVectorizer - Bag of Words
- TfidfTransformer - TF-IDF values
- TfidfVectorizer - Bag of Words AND TF-IDF values
Petroleum engineer - Upstream oil and gas analyst - Energy Analyst - Data Champion