| CI/CD | License | Python | Streamlit | Docker | GCP | Terraform |
|---|---|---|---|---|---|---|
π Live Demo
Briefly is a lightweight, AI-powered ETL pipeline that pulls trending news headlines, summarizes them using Google's Gemini API, and displays them in a clean web app interface. It's built with Python, Streamlit, and GCP β ideal for showcasing real-time NLP + data engineering skills.
- Extract top news stories from Hacker News
- Summarize headlines using Gemini 1.5 Pro
- Display summaries in a dynamic Streamlit app
- Top navigation bar with Feed and Trending views
- Light/Dark theme toggle in the header
- Live date range and source filtering in the sidebar
- Preview logos for each article (with fallback)
- Optional support for BigQuery or CSV export
- Free-tier compatible (Google Gemini 1.5)
- Python (ETL scripts)
- BigQuery (cloud data warehouse)
- Gemini API (summarization)
- Streamlit (web UI)
- Terraform (infra-as-code)
- Docker (optional for app deployment)
briefly/
βββ docker-compose.yaml
βββ Dockerfile
βββ etl
β βββ __pycache__
β βββ extract.py
β βββ insert_sample_data.py
β βββ list_models.py
β βββ load.py
β βββ run_pipeline.py
β βββ setup_bigquery.py
β βββ summarize.py
β βββ test_bigquery.py
β βββ transform.py
βββ LICENSE
βββ notebooks
βββ README.md
βββ requirements.txt
βββ terraform
β βββ main.tf
β βββ outputs.tf
β βββ provider.tf
β βββ terraform.tfstate
β βββ terraform.tfstate.backup
β βββ terraform.tfvars
β βββ variables.tf
βββ venv
β βββ bin
β βββ etc
β βββ include
β βββ lib
β βββ pyvenv.cfg
β βββ share
βββ webapp
βββ app.py
To get started with this project, you'll need the following tools installed:
- Python 3.11+
- Terraform 1.3+
- Google Cloud SDK β required for authenticating with GCP and managing infrastructure
- Streamlit β installed via
pip install -r requirements.txt
- Clone the repo
- Create a
.envfile:GEMINI_API_KEY=your-api-key-here - Ensure your Google Cloud credentials are available:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account.json export GCP_PROJECT=your-gcp-project-id - Install dependencies:
pip install -r requirements.txt
# Create and activate your virtual environment (if needed)
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run full ETL pipeline (extract, summarize, and load into BigQuery)
python etl/run_pipeline.py
# Launch the frontend dashboard
streamlit run webapp/app.pyIf you want to store and analyze summaries in BigQuery:
- Set your GCP credentials and project ID as environment variables:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json export GCP_PROJECT=your-gcp-project-id
- Run the setup script to create the dataset and table:
python etl/setup_bigquery.py
- Use
etl/run_pipeline.pyto automatically push new summaries to BigQuery.
Summaries are stored in the briefly_data.summaries table with fields like url, title, summary, source, published_at, and summarized_at.
You can provision the required GCP infrastructure using Terraform:
-
Navigate to the Terraform directory:
cd terraform/ -
Set your environment credentials (if not already):
export GOOGLE_APPLICATION_CREDENTIALS=./.secrets/terraform-admin-key.json -
Initialize the Terraform project:
terraform init
-
Review the plan:
terraform plan
-
Apply the infrastructure:
terraform apply
Terraform will create:
- A BigQuery dataset and summaries table
- A service account with
bigquery.userpermissions - GitHub Actions CI/CD validation pipeline
To tear down all Terraform-managed resources:
terraform destroyThis will prompt you to confirm deletion of all provisioned infrastructure.
For team collaboration and state consistency, configure a remote backend using Google Cloud Storage (GCS):
-
Create a GCS bucket (e.g.
briefly-terraform-state) -
Enable versioning on the bucket:
gsutil versioning set on gs://briefly-terraform-state -
Add a backend config to your
provider.tformain.tf:
terraform {
backend "gcs" {
bucket = "briefly-terraform-state"
prefix = "terraform/state"
}
}- Reinitialize Terraform to migrate local state:
terraform init -migrate-stateThis ensures your Terraform state is versioned, backed up, and team-ready.
MIT β free to use, extend, and showcase.
This project is complete and production-ready. Further improvements (e.g. CI deployment, testing automation, or remote backends) can be added as future enhancements.