Skip to content

TrendNest is a modular, AI-integrated data pipeline that extracts, cleans, models, and visualizes time-based trends from data. It includes Gemini 1.5 summarisation, CSV export, and a dashboard UI. Built with Python, SQL, and BigQuery support, and fully dockerized for deployment โ€”for data engineering and analytics portfolios.

License

Notifications You must be signed in to change notification settings

Peippo1/TrendNest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TrendNest

Repo Size Last Commit License Python Version Build Docker GCP Gemini Streamlit
Repo size Last commit License Python version Build status Docker support GCP BigQuery Gemini AI summarization Streamlit App

TrendNest is a portfolio-ready data pipeline and dashboard project that ingests, transforms, models, and visualizes data trends over time. It integrates AI summarization using Gemini 1.5 and supports exporting cleaned data to CSV. The project is fully containerized and deployable.

๐Ÿ”ง Features

  • Data extraction from various sources (e.g. APIs, databases, files)
  • Transformation pipeline via configurable "recipe"
  • Time-based trend modeling
  • AI-generated summaries using Gemini 1.5
  • Interactive dashboard built with Streamlit (or Dash)
  • CSV downloads of processed data
  • Dockerized for deployment

๐Ÿ—‚ Project Structure

TrendNest/
โ”œโ”€โ”€ dags/                      # Airflow DAGs (optional)
โ”œโ”€โ”€ dashboard/                 # Streamlit dashboard app
โ”‚   โ””โ”€โ”€ app.py                 # Main UI script
โ”œโ”€โ”€ data/                      # Local and processed data
โ”‚   โ”œโ”€โ”€ cleaned_data.csv       # Output from pipeline
โ”‚   โ””โ”€โ”€ sample.csv             # Example input data
โ”œโ”€โ”€ docker/                    # Containerization setup
โ”‚   โ””โ”€โ”€ Dockerfile             # Docker build instructions
โ”œโ”€โ”€ docs/                      # Documentation and notes
โ”‚   โ””โ”€โ”€ design.md              # System design outline
โ”œโ”€โ”€ notebooks/                 # Jupyter notebooks (EDA, prototyping)
โ”œโ”€โ”€ sql/                       # BigQuery-compatible SQL queries
โ”‚   โ”œโ”€โ”€ monthly_averages.sql   # Avg monthly close/volume
โ”‚   โ”œโ”€โ”€ latest_prices.sql      # Most recent close prices
โ”‚   โ””โ”€โ”€ volume_spikes.sql      # High-volume trading days
โ”œโ”€โ”€ src/                       # Core data pipeline logic
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ config.py              # Config constants
โ”‚   โ”œโ”€โ”€ extract.py             # Local/CSV data extraction
โ”‚   โ”œโ”€โ”€ extract_stocks.py      # YFinance stock extractor
โ”‚   โ”œโ”€โ”€ transform.py           # Data cleaning
โ”‚   โ”œโ”€โ”€ model.py               # Trend modeling
โ”‚   โ”œโ”€โ”€ summarize.py           # Gemini AI summaries
โ”‚   โ”œโ”€โ”€ export.py              # CSV export
โ”‚   โ””โ”€โ”€ upload.py              # BigQuery uploader
โ”œโ”€โ”€ test_https.py              # API connectivity test
โ”œโ”€โ”€ test_upload.py             # BigQuery upload test
โ”œโ”€โ”€ test_yfinance_fetch.py     # yfinance fetch test
โ”œโ”€โ”€ tests/                     # Unit tests (placeholder)
โ”œโ”€โ”€ run_pipeline.py            # Main pipeline runner
โ”œโ”€โ”€ requirements.txt           # Python dependencies
โ”œโ”€โ”€ .env.example               # Sample environment variables (copy to .env)
โ”œโ”€โ”€ .gitignore                 # Git exclusions
โ””โ”€โ”€ README.md                  # This file

๐Ÿš€ Getting Started

  1. Clone the repo:

    git clone https://github.com/yourusername/TrendNest.git
    cd TrendNest
    
  2. Set up your environment:

    python -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  3. Copy .env.example to .env, then fill in your own credentials. Keep .env out of version control.

  4. (Optional) Set up observability:

    • LOG_LEVEL controls verbosity (default INFO).
    • To emit OpenTelemetry traces/metrics to a collector, set OTEL_EXPORTER_OTLP_ENDPOINT (HTTP/OTLP) and optional OTEL_EXPORTER_OTLP_HEADERS for auth. Without it, spans are printed to stdout and metrics stay local.
    • ENVIRONMENT tags spans/metrics (e.g., dev, staging, prod).
    • TOP_PERFORMERS_LIMIT and TICKERS_UNIVERSE let you tune the ticker selection.
    • Resilience knobs: MAX_WORKERS, FETCH_TIMEOUT_SECONDS, FETCH_MAX_RETRIES, FETCH_BACKOFF_SECONDS, FETCH_PERIOD, FETCH_INTERVAL, and DEAD_LETTER_PATH for failed rows.
  5. Run the pipeline:

    python run_pipeline.py
    
  6. Start the dashboard:

    streamlit run dashboard/app.py
    

Command-line overrides:

python run_pipeline.py --tickers AAPL,MSFT --limit 5 --period 1mo --interval 1d --export-path /tmp/output.csv --dead-letter-path /tmp/failed.csv

Use a YAML config file to override settings:

python run_pipeline.py --config config.yaml

๐Ÿ” Observability + metrics

  • Tracing: pipeline run โ†’ per-ticker spans + downstream HTTP (requests/yfinance) via OpenTelemetry.
  • Metrics: counters for runs, tickers processed, and rows processed (trendnest.pipeline.*). They export via OTLP if configured, else stay in-process.
  • Logs: structured logging with run_id on key entries; adjust LOG_LEVEL as needed.
  • Resilience: bounded retries with jitter, timeouts on fetches, concurrent ticker processing (MAX_WORKERS), and a dead-letter CSV for failures.
  • Metrics expanded: fetch latency histogram (trendnest.pipeline.fetch_latency_seconds) and retry/failure counters.

๐Ÿงช Testing & CI

  • Run tests locally: python -m pip install -r requirements-dev.txt && pytest -q
  • GitHub Actions workflow (.github/workflows/ci.yml) runs tests on pushes/PRs to main.

๐Ÿ›ก๏ธ Security

  • Keep secrets out of git; use .env.example as a template and prefer cloud secret storage.
  • See SECURITY.md for reporting guidance and hygiene tips.

๐Ÿง  AI Summarization (Gemini 1.5)

TrendNest integrates Gemini 1.5 to generate natural language summaries of key insights in your trend data. This makes the dashboard useful to both technical and non-technical stakeholders.

Example summary output:

"Apple's stock (AAPL) shows a general upward trend from December 2024 to June 2025, increasing from ~$172 to ~$258. Trading volume spiked in June, suggesting heightened investor interest."

๐Ÿ—ƒ๏ธ BigQuery Integration

TrendNest supports uploading cleaned trend data to Google BigQuery. This enables:

  • SQL-based analysis
  • Historical trend aggregation
  • Integration with Looker Studio or other BI tools

Each run appends to the trendnest.cleaned_stock_data table using a service account key.

๐Ÿงฎ SQL Querying Example

Once data is in BigQuery, you can run SQL like:

SELECT
  FORMAT_DATE('%Y-%m', PARSE_DATE('%Y-%m-%d', date)) AS month,
  ROUND(AVG(CAST(Close AS FLOAT64)), 2) AS avg_close,
  ROUND(AVG(CAST(Volume AS INT64))) AS avg_volume
FROM `trendnest-463421.trendnest.cleaned_stock_data`
WHERE Ticker = 'AAPL'
GROUP BY month
ORDER BY month;

๐Ÿ“‚ Included SQL Files

The /sql/ directory contains reusable queries for analytics and dashboarding:

  • monthly_averages.sql: Calculates average monthly closing price and trading volume
  • latest_prices.sql: Retrieves the most recent closing price for each ticker
  • volume_spikes.sql: Identifies unusually high trading volume days

These can be run in BigQuery or loaded into the dashboard for insights.


๐Ÿณ Docker Support

Build and run the container:

docker build -t trendnest .
docker run -p 8501:8501 trendnest

๐Ÿ“„ License

MIT โ€” free to use, modify, and distribute.

๐Ÿ“ฆ Changelog

v1.1.0

  • Integrated Gemini 1.5 for AI-generated summaries
  • Implemented BigQuery upload via service account
  • Enabled SQL querying and Looker Studio compatibility

v1.2.0

  • Multi-ticker support added with interactive dashboard controls
  • Upgraded Streamlit dashboard with Altair charts (line and bar)
  • Dynamic filtering and AI summaries per selected ticker
  • Enhanced CSV export for selected tickers and date ranges
  • Improved dashboard responsiveness and readability

About

TrendNest is a modular, AI-integrated data pipeline that extracts, cleans, models, and visualizes time-based trends from data. It includes Gemini 1.5 summarisation, CSV export, and a dashboard UI. Built with Python, SQL, and BigQuery support, and fully dockerized for deployment โ€”for data engineering and analytics portfolios.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages