Skip to content

Burmese news text classification using Naïve Bayes and BiLSTM, custom NLP pipeline with Pyidaungsu tokenizer and stopword filtering.

Notifications You must be signed in to change notification settings

9eek9/Burmese-News-Classification

Repository files navigation

📰 Burmese News Classification with BiLSTM & Streamlit

This project builds an AI-powered Burmese news classifier using both traditional machine learning (Naïve Bayes) and deep learning (BiLSTM).
It includes a ready-to-run Streamlit web app for real-time predictions in the Burmese language.


🧩 Project Overview

  • Classifies Burmese news articles into categories such as Politics, Business, Sports, and Entertainment.
  • Implements custom Burmese NLP preprocessing — tokenization, stopword removal, and sequence padding.
  • Compares performance between Naïve Bayes (TF-IDF) and BiLSTM (Keras).
  • Provides an interactive Streamlit UI for user testing and visualization.

📊 Model Highlights

Model Type Accuracy Description
Naïve Bayes ML baseline ~80% TF-IDF with sklearn
BiLSTM Deep Learning ~90% Sequence model using TensorFlow/Keras

📁 Project Structure

├── app.py                        # Streamlit web app
├── Burmese_News_Classification.ipynb  # Training notebook
├── requirements.txt              # Dependencies
├── stopwords.txt                 # Burmese stopword list
├── models/                       # Trained models
│   ├── bilstm_mynews.keras
│   ├── nb_tfidf.joblib
│   ├── keras_tokenizer.pkl
│   ├── label_encoder.pkl
│   └── config.json
└── README_streamlit.md           # Detailed Streamlit setup

🚀 Running the Streamlit App

pip install -r requirements.txt
streamlit run app.py

Then open the provided local URL (e.g. http://localhost:8501) in your browser.


🧠 Example Predictions

Input Text Predicted Category
"အဆိုတော်အသစ်တစ်ဦး ပြိုင်ပွဲတွင် ပထမဆုရရှိခဲ့သည်။" Entertainment
"အစိုးရသည် စီးပွားရေးဖွံ့ဖြိုးရေးအတွက် ငွေထုတ်ပေးမည်။" Business

🧰 Technologies Used

  • Python 3.10
  • TensorFlow / Keras
  • Scikit-learn
  • Pyidaungsu Tokenizer
  • Streamlit for deployment

🔮 Future Enhancements

  • Integrate Multilingual BERT (mBERT) for improved contextual understanding.
  • Add Gradio/Flask API endpoint for backend usage.
  • Extend to multi-label classification (e.g., News + Sentiment).

About

Burmese news text classification using Naïve Bayes and BiLSTM, custom NLP pipeline with Pyidaungsu tokenizer and stopword filtering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published