This project builds an AI-powered Burmese news classifier using both traditional machine learning (Naïve Bayes) and deep learning (BiLSTM).
It includes a ready-to-run Streamlit web app for real-time predictions in the Burmese language.
- Classifies Burmese news articles into categories such as Politics, Business, Sports, and Entertainment.
- Implements custom Burmese NLP preprocessing — tokenization, stopword removal, and sequence padding.
- Compares performance between Naïve Bayes (TF-IDF) and BiLSTM (Keras).
- Provides an interactive Streamlit UI for user testing and visualization.
| Model | Type | Accuracy | Description |
|---|---|---|---|
| Naïve Bayes | ML baseline | ~80% | TF-IDF with sklearn |
| BiLSTM | Deep Learning | ~90% | Sequence model using TensorFlow/Keras |
├── app.py # Streamlit web app
├── Burmese_News_Classification.ipynb # Training notebook
├── requirements.txt # Dependencies
├── stopwords.txt # Burmese stopword list
├── models/ # Trained models
│ ├── bilstm_mynews.keras
│ ├── nb_tfidf.joblib
│ ├── keras_tokenizer.pkl
│ ├── label_encoder.pkl
│ └── config.json
└── README_streamlit.md # Detailed Streamlit setup
pip install -r requirements.txt
streamlit run app.pyThen open the provided local URL (e.g. http://localhost:8501) in your browser.
| Input Text | Predicted Category |
|---|---|
| "အဆိုတော်အသစ်တစ်ဦး ပြိုင်ပွဲတွင် ပထမဆုရရှိခဲ့သည်။" | Entertainment |
| "အစိုးရသည် စီးပွားရေးဖွံ့ဖြိုးရေးအတွက် ငွေထုတ်ပေးမည်။" | Business |
- Python 3.10
- TensorFlow / Keras
- Scikit-learn
- Pyidaungsu Tokenizer
- Streamlit for deployment
- Integrate Multilingual BERT (mBERT) for improved contextual understanding.
- Add Gradio/Flask API endpoint for backend usage.
- Extend to multi-label classification (e.g., News + Sentiment).