Skip to content

reduces a document to a shorter version, retaining key points. extractive summarization selects important content from the source using text summarization

License

Notifications You must be signed in to change notification settings

yxshee/summarization-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

57 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ Summarization NLP: AI-Powered Text Distillation ๐Ÿš€

License: MIT Python 3.8+ Hugging Face PyTorch

"From Information Overload to Insightful Clarity" โœจ

Demo

๐ŸŒŸ Features

  • ๐ŸŽฏ Abstractive Summarization - Generate human-like summaries with novel phrasing
  • ๐Ÿ“ Length Control - Customize summary length via simple parameters
  • ๐ŸŒ Multilingual Support - Process text in 44 languages (XL-Sum dataset)
  • โšก API Ready - REST endpoints for seamless integration
  • ๐Ÿ–ฅ๏ธ Interactive Demo - Web interface for instant experimentation

๐Ÿ“š Table of Contents

  1. ๐Ÿ“Œ Project Overview
  2. ๐Ÿ“Š Dataset Insights
  3. ๐Ÿง  Model Architecture
  4. ๐Ÿ“ˆ Performance Evaluation
  5. โš™๏ธ Installation Guide
  6. ๐Ÿš€ Quick Start
  7. ๐ŸŒ Deployment Options
  8. ๐Ÿ”ฎ Future Roadmap
  9. ๐Ÿค Contribution Guidelines
  10. ๐Ÿ“œ License

๐Ÿ“Œ Project Overview

In an age of information overload, Summarization NLP acts as your AI-powered lens ๐Ÿ” to focus on what matters. Key capabilities:

โœ… Convert lengthy documents to concise insights
โœ… Maintain original meaning through abstractive generation
โœ… Handle multiple languages effortlessly
โœ… Integrate via API into existing workflows

Explore Model on Hugging Face ๐Ÿค—


๐Ÿ“Š Dataset Insights

๐Ÿ“ฆ XL-Sum Dataset Structure

Dataset({
    features: ['id', 'article', 'summary'],
    num_rows: 300000
})

๐Ÿ“ Sample Data

Article Excerpt Generated Summary
"Recent stock market volatility linked to geopolitical tensions..." "Geopolitical tensions cause stock market fluctuations, prompting investor caution."
"AI advancements revolutionize healthcare diagnostics..." "Healthcare transformed by AI-driven diagnostic breakthroughs."

๐Ÿง  Model Architecture

T5 Transformer Overview

graph TD
    A[Input Text] --> B(T5 Encoder)
    B --> C[Latent Representation]
    C --> D(T5 Decoder)
    D --> E[Generated Summary]
Loading

๐Ÿ‹๏ธ Training Parameters

Component Specification
Base Model T5-Small
Optimizer AdamW (lr=3e-5)
Batch Size 16
Training Epochs 5
Max Sequence Length 512 tokens

๐Ÿ“ˆ Performance Evaluation

๐Ÿ“Š ROUGE Scores

Metric Score
ROUGE-1 0.238
ROUGE-2 0.056
ROUGE-L 0.122
ROUGE-Lsum 0.155

๐Ÿ” Sample Comparison

Input:
"Climate change impacts accelerate, with unprecedented Arctic ice melt reported..."

Generated Summary:
"Rapid Arctic ice melt highlights accelerating climate change impacts."

Reference Summary:
"Scientists report record Arctic ice loss due to climate change."


โš™๏ธ Installation Guide

System Requirements

  • Python 3.8+
  • 8GB+ RAM
  • 2GB+ Free Disk Space

Setup Instructions

# Clone repository
git clone https://github.com/yxshee/summarization-nlp.git
cd summarization-nlp

# Create virtual environment
python -m venv .env
source .env/bin/activate  # Windows: .env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download model (optional - will auto-download on first run)
python -c "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM; \
AutoTokenizer.from_pretrained('t5-small'); \
AutoModelForSeq2SeqLM.from_pretrained('t5-small')"

๐Ÿš€ Quick Start

Python API Usage

from summarizer import TextProcessor

processor = TextProcessor()
article = """[Insert long article text here]..."""

# Generate summary
summary = processor.summarize(
    text=article,
    max_length=150,  # ๐ŸŽš๏ธ Control summary length
    temperature=0.7  # ๐ŸŽ›๏ธ Adjust creativity
)

print(f"๐Ÿ“ Summary:\n{summary}")

Command Line Interface

python cli.py --text "Your input text here" --length 100

๐ŸŒ Deployment Options

๐Ÿณ Docker Deployment

FROM python:3.10-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 8501
CMD ["streamlit", "run", "code/app.py", "--server.port=8501"]

โ˜๏ธ Cloud Deployment

  1. AWS SageMaker
  2. Google AI Platform
  3. Azure ML Services

๐Ÿ”ฎ Future Roadmap

  • ๐ŸŒ Enhanced Multilingual Support
  • โšก Real-Time Streaming API
  • ๐Ÿงฉ Modular Architecture
  • ๐Ÿ“Š Advanced Analytics Dashboard
  • ๐Ÿ” Explainable AI Features

๐Ÿค Contribution Guidelines

We Welcome:
๐Ÿ”ง Code Contributions
๐Ÿ› Bug Reports
๐Ÿ’ก Feature Requests
๐Ÿ“– Documentation Improvements

First Time? Try our good-first-issue labeled tasks!


๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with โค๏ธ by YXSHEE | ๐Ÿ“š Transform Text into Knowledge!

About

reduces a document to a shorter version, retaining key points. extractive summarization selects important content from the source using text summarization

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors