🧠 Transformer-Based Text Summarizer

A scalable, modular, and memory-efficient text summarization system using facebook/bart-base, built with full MLOps-style pipelines for data ingestion, validation, transformation, training, evaluation, and deployment.

⚙️ Designed for research and real-world deployment with CLI, Streamlit frontend, and FastAPI backend support.

🚀 Project Highlights

Model: facebook/bart-base via Hugging Face
Dataset: cnn_dailymail v3.0.0 (abisee/cnn_dailymail)
Training Data Used: 50K samples (scalable to full 287K)
Validation: 3K (notebook) / 1K (pipeline) (out of 13.4K)
Test: 3K (notebook) / 1K (pipeline) (out of 11.5k)
Evaluation: ROUGE-L: 20.67, ROUGE-1: 25.21, ROUGE-2: 12.28
Frameworks: PyTorch, Hugging Face Transformers + Accelerate
Deployment: Dockerized, Streamlit UI, FastAPI backend

🔁 Pipeline Flow

Implemented using modular stages, YAML-config driven execution

Data Ingestion → cnn_dailymail load and split
Data Validation → schema check, format verification
Data Transformation → tokenization and formatting
Model Training → with Seq2SeqTrainer
Model Evaluation → on custom-held-out test set
Prediction → CLI or UI-based summarization

⚡ Accelerated Training (for low VRAM GPUs)

training_args = Seq2SeqTrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    fp16=True,                      # Mixed-precision
    num_train_epochs=3,
    predict_with_generate=True
)

Trained on 4GB RTX 3050 using Hugging Face Accelerate for speed and efficiency.

📊 Evaluation Results

Train Loss: 0.9688 | Global Steps: 18,750
Test Loss: 0.9473 | ROUGE-1: 25.21 | ROUGE-2: 12.28 | ROUGE-L: 20.67

🖥 UI + API Support

🔹 CLI

Run training or prediction:

python main.py

🔹 Streamlit Frontend

streamlit run frontend_app.py

🔹 FastAPI Backend

uvicorn backend_app:app --reload

🐳 Docker Support

Build and run container:

docker build -t summarizer-app .
docker run -p 8000:8000 summarizer-app

🤝 Contribution

This is a solo project but open to contributors. Feel free to raise issues, suggest features, or submit PRs 🚀

📜 License

MIT License

📌 TODO (Future Work)

Add support for peft / LoRA fine-tuning
Implement memory-efficient attention (FlashAttention)
Deploy to Hugging Face Spaces or Streamlit Cloud
Add multi-lingual summarization support

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/worlflows		.github/worlflows
config		config
research		research
src/textSummarizer		src/textSummarizer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
backend_app.py		backend_app.py
frontend_app.py		frontend_app.py
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
testing.py		testing.py
trial.py		trial.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Transformer-Based Text Summarizer

🚀 Project Highlights

🔁 Pipeline Flow

⚡ Accelerated Training (for low VRAM GPUs)

📊 Evaluation Results

🖥 UI + API Support

🔹 CLI

🔹 Streamlit Frontend

🔹 FastAPI Backend

🐳 Docker Support

🤝 Contribution

📜 License

📌 TODO (Future Work)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AnuragIndora/Text-Summarizer

Folders and files

Latest commit

History

Repository files navigation

🧠 Transformer-Based Text Summarizer

🚀 Project Highlights

🔁 Pipeline Flow

⚡ Accelerated Training (for low VRAM GPUs)

📊 Evaluation Results

🖥 UI + API Support

🔹 CLI

🔹 Streamlit Frontend

🔹 FastAPI Backend

🐳 Docker Support

🤝 Contribution

📜 License

📌 TODO (Future Work)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages