Skip to content

Briefly is a lightweight data pipeline that fetches top tech headlines from Hacker News, uses Google Gemini (1.5 Pro) to generate concise AI summaries, and displays them in a sleek Streamlit interface. Built with Python, Databricks, and a free-tier-friendly architecture.

License

Notifications You must be signed in to change notification settings

Peippo1/briefly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CI/CD License Python Streamlit Docker GCP Terraform
CI License: MIT Python Built with Streamlit Dockerized Runs on GCP Terraform

πŸ“° Briefly

πŸ”— Live Demo

Briefly is a lightweight, AI-powered ETL pipeline that pulls trending news headlines, summarizes them using Google's Gemini API, and displays them in a clean web app interface. It's built with Python, Streamlit, and GCP β€” ideal for showcasing real-time NLP + data engineering skills.

πŸš€ Features

  • Extract top news stories from Hacker News
  • Summarize headlines using Gemini 1.5 Pro
  • Display summaries in a dynamic Streamlit app
  • Top navigation bar with Feed and Trending views
  • Light/Dark theme toggle in the header
  • Live date range and source filtering in the sidebar
  • Preview logos for each article (with fallback)
  • Optional support for BigQuery or CSV export
  • Free-tier compatible (Google Gemini 1.5)

🧱 Tech Stack

  • Python (ETL scripts)
  • BigQuery (cloud data warehouse)
  • Gemini API (summarization)
  • Streamlit (web UI)
  • Terraform (infra-as-code)
  • Docker (optional for app deployment)

πŸ“‚ Project Structure

briefly/
β”œβ”€β”€ docker-compose.yaml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ etl
β”‚   β”œβ”€β”€ __pycache__
β”‚   β”œβ”€β”€ extract.py
β”‚   β”œβ”€β”€ insert_sample_data.py
β”‚   β”œβ”€β”€ list_models.py
β”‚   β”œβ”€β”€ load.py
β”‚   β”œβ”€β”€ run_pipeline.py
β”‚   β”œβ”€β”€ setup_bigquery.py
β”‚   β”œβ”€β”€ summarize.py
β”‚   β”œβ”€β”€ test_bigquery.py
β”‚   └── transform.py
β”œβ”€β”€ LICENSE
β”œβ”€β”€ notebooks
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ terraform
β”‚   β”œβ”€β”€ main.tf
β”‚   β”œβ”€β”€ outputs.tf
β”‚   β”œβ”€β”€ provider.tf
β”‚   β”œβ”€β”€ terraform.tfstate
β”‚   β”œβ”€β”€ terraform.tfstate.backup
β”‚   β”œβ”€β”€ terraform.tfvars
β”‚   └── variables.tf
β”œβ”€β”€ venv
β”‚   β”œβ”€β”€ bin
β”‚   β”œβ”€β”€ etc
β”‚   β”œβ”€β”€ include
β”‚   β”œβ”€β”€ lib
β”‚   β”œβ”€β”€ pyvenv.cfg
β”‚   └── share
└── webapp
    └── app.py

πŸ›  System Requirements

To get started with this project, you'll need the following tools installed:

πŸ”‘ Environment Setup

  1. Clone the repo
  2. Create a .env file:
    GEMINI_API_KEY=your-api-key-here
    
  3. Ensure your Google Cloud credentials are available:
    export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account.json
    export GCP_PROJECT=your-gcp-project-id
    

    (Required for BigQuery integration)

  4. Install dependencies:
    pip install -r requirements.txt
    

πŸ§ͺ Run Locally

# Create and activate your virtual environment (if needed)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run full ETL pipeline (extract, summarize, and load into BigQuery)
python etl/run_pipeline.py

# Launch the frontend dashboard
streamlit run webapp/app.py

πŸ“‘ BigQuery Integration

If you want to store and analyze summaries in BigQuery:

  1. Set your GCP credentials and project ID as environment variables:
    export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json
    export GCP_PROJECT=your-gcp-project-id
  2. Run the setup script to create the dataset and table:
    python etl/setup_bigquery.py
  3. Use etl/run_pipeline.py to automatically push new summaries to BigQuery.

Summaries are stored in the briefly_data.summaries table with fields like url, title, summary, source, published_at, and summarized_at.

πŸ—οΈ Terraform Infrastructure

You can provision the required GCP infrastructure using Terraform:

  1. Navigate to the Terraform directory:

    cd terraform/
  2. Set your environment credentials (if not already):

    export GOOGLE_APPLICATION_CREDENTIALS=./.secrets/terraform-admin-key.json
  3. Initialize the Terraform project:

    terraform init
  4. Review the plan:

    terraform plan
  5. Apply the infrastructure:

    terraform apply

Terraform will create:

  • A BigQuery dataset and summaries table
  • A service account with bigquery.user permissions
  • GitHub Actions CI/CD validation pipeline

🧹 Terraform Cleanup and Remote Backend (Optional)

Destroy Infrastructure

To tear down all Terraform-managed resources:

terraform destroy

This will prompt you to confirm deletion of all provisioned infrastructure.


Use a Remote Backend (Optional but Recommended)

For team collaboration and state consistency, configure a remote backend using Google Cloud Storage (GCS):

  1. Create a GCS bucket (e.g. briefly-terraform-state)

  2. Enable versioning on the bucket:

    gsutil versioning set on gs://briefly-terraform-state
  3. Add a backend config to your provider.tf or main.tf:

terraform {
  backend "gcs" {
    bucket  = "briefly-terraform-state"
    prefix  = "terraform/state"
  }
}
  1. Reinitialize Terraform to migrate local state:
terraform init -migrate-state

This ensures your Terraform state is versioned, backed up, and team-ready.

πŸ“œ License

MIT β€” free to use, extend, and showcase.

βœ… Project Status

This project is complete and production-ready. Further improvements (e.g. CI deployment, testing automation, or remote backends) can be added as future enhancements.

About

Briefly is a lightweight data pipeline that fetches top tech headlines from Hacker News, uses Google Gemini (1.5 Pro) to generate concise AI summaries, and displays them in a sleek Streamlit interface. Built with Python, Databricks, and a free-tier-friendly architecture.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •