🇨🇦 Verity - Finding Truth Within the Noise

A Retrieval-Augmented Generation (RAG) system for analyzing 3M+ Reddit posts from r/Canada

Built for AI Hackathon in the North 2026

Features • Demo • Architecture • Setup • Tech Stack

🎯 What is Verity?

Verity is an AI-powered platform that helps users understand Canadian public discourse by analyzing millions of Reddit posts. Using advanced RAG (Retrieval-Augmented Generation) technology with Groq's ultra-fast LLM inference, it retrieves relevant discussions and generates insightful answers backed by real community conversations.

🌟 Key Highlights

🔍 Semantic Search - Vector similarity search across 2.97M+ Reddit posts
⚡ Lightning Fast - Powered by Groq's llama-3.3-70b-versatile for sub-second responses
🤖 AI Insights - Context-aware answer generation with source attribution
📊 Real-time Stats - Live API usage tracking and dataset statistics
🎨 Beautiful UI - Modern React interface with gradient glassmorphic design

🚀 Live Demo

Backend API

🔗 https://shams0026-canadaconvo-backend.hf.space

Try these endpoints:

📚 API Docs: /docs
❤️ Health Check: /api/v1/health
📊 Stats: /api/v1/stats

Example Query

curl -X POST "https://shams0026-canadaconvo-backend.hf.space/api/v1/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are Canadians saying about healthcare?",
    "top_k": 10
  }'

✨ Features

🎯 Core Capabilities

🔍 Intelligent Search Semantic understanding of natural language queries Vector similarity search using sentence-transformers Context-aware retrieval from 2.97M posts Relevance scoring and ranking	🤖 AI-Powered Analysis Groq API integration for ultra-fast inference Gemini API fallback for reliability Source-backed answers with citations Comprehensive sentiment analysis
⚡ Performance Optimized Response caching for instant results Efficient batch processing ChromaDB for fast vector operations Sub-second query response times	🎨 Modern Interface Gradient glassmorphic design Real-time statistics dashboard Responsive layout for all devices Smooth animations and transitions

🏗️ Architecture

System Overview

┌─────────────────────────────────────────────────────────────┐
│                    FRONTEND (React + Vite)                  │
│  ┌────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   Chat     │  │   Message    │  │    Source    │       │
│  │ Interface  │→ │   Display    │→ │    Cards     │       │
│  └────────────┘  └──────────────┘  └──────────────┘       │
│         ↓                                                   │
│    API Client (services/api.ts)                            │
└─────────────────────────────────────────────────────────────┘
                          ↓ HTTPS
┌─────────────────────────────────────────────────────────────┐
│              BACKEND (FastAPI on HF Space)                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │            API Layer (/api/v1)                       │  │
│  │   /query  |  /health  |  /stats                     │  │
│  └──────────────────────────────────────────────────────┘  │
│                          ↓                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                  RAG Engine                          │  │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐    │  │
│  │  │  Embed     │→ │  Retrieve  │→ │  Generate  │    │  │
│  │  │  Query     │  │  Context   │  │  Response  │    │  │
│  │  └────────────┘  └────────────┘  └────────────┘    │  │
│  └──────────────────────────────────────────────────────┘  │
│                          ↓                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│  │  Groq API    │  │   ChromaDB   │  │ Embeddings   │    │
│  │ (Primary)    │  │  Vector DB   │  │   Service    │    │
│  └──────────────┘  └──────────────┘  └──────────────┘    │
└─────────────────────────────────────────────────────────────┘

RAG Pipeline Flow

Step-by-Step:

Query Embedding → Convert user query to 384-dim vector using sentence-transformers
Vector Search → Find top-K similar posts from 2.97M embeddings in ChromaDB
Context Building → Structure retrieved posts into a comprehensive prompt
LLM Generation → Groq's llama-3.3-70b generates insights from context
Response Formatting → Return answer with source citations and metadata

🛠️ Tech Stack

Backend

Technology	Purpose	Version
FastAPI	Web framework	0.104.1
ChromaDB	Vector database	0.4.18
Groq API	LLM inference (Primary)	llama-3.3-70b-versatile
Gemini API	LLM inference (Fallback)	gemini-2.5-flash
sentence-transformers	Text embeddings	all-MiniLM-L6-v2 (384-dim)
Uvicorn	ASGI server	Latest
Pydantic	Data validation	2.x
Hugging Face Spaces	Deployment	Docker SDK

Frontend

Technology	Purpose	Version
React	UI framework	19.2
TypeScript	Type safety	5.9
Vite	Build tool	7.2.4
Tailwind CSS	Styling	3.4.1
Lucide React	Icons	Latest

Data Pipeline

Python scripts for ETL
Pandas & NumPy for data processing
ChromaDB persistent vector store (16GB)
Pre-computed embeddings for 2.97M posts

📊 Dataset

Reddit Canada Dataset

Source: r/Canada subreddit
Total Posts: 2,972,749
Time Period: Historical discussions
Embedding Dimensions: 384
Vector Database Size: 16GB
Metadata: Titles, scores, comments, years

Statistics

{
  "total_posts": 2972749,
  "embedding_dimension": 384,
  "vector_db_size": "16GB",
  "model": "all-MiniLM-L6-v2",
  "database": "ChromaDB"
}

🚀 Quick Start

Prerequisites

Python 3.12+
Node.js 18+
Git
8GB+ RAM (for running locally with full dataset)

Backend Setup

# 1. Clone repository
git clone https://github.com/Shams261/Verity.git
cd Verity/canadaconvo-backend

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
# Edit .env and add your API keys:
# - GROQ_API_KEY=your_groq_key_here
# - HACKATHON_API_KEY=your_gemini_key_here (optional fallback)

# 5. Start server
python app.py

Backend will be running at http://localhost:8000

API Documentation: http://localhost:8000/docs

Frontend Setup

# 1. Navigate to frontend
cd ../canadaconvo-frontend

# 2. Install dependencies
npm install

# 3. Configure API endpoint (if needed)
# Create .env.local with:
# VITE_API_URL=http://localhost:8000/api/v1

# 4. Start development server
npm run dev

Frontend will be running at http://localhost:5173

🔧 Configuration

Environment Variables

Backend (.env)

# LLM Configuration
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama-3.3-70b-versatile
USE_GROQ=true

# Fallback LLM (optional)
HACKATHON_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.5-flash

# Embeddings
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_DIM=384

# RAG Settings
RETRIEVAL_TOP_K=50
RERANK_TOP_K=20

# Server
HOST=0.0.0.0
PORT=8000

Frontend (.env.local)

# Backend API URL
VITE_API_URL=https://shams0026-canadaconvo-backend.hf.space/api/v1

📚 API Documentation

Base URL

https://shams0026-canadaconvo-backend.hf.space/api/v1

Endpoints

1. Query Endpoint

POST /query

Submit a natural language query to analyze Canadian discourse.

Request:

{
  "query": "What are Canadians saying about housing affordability?",
  "top_k": 20
}

Response:

{
  "query": "What are Canadians saying about housing affordability?",
  "answer": "Based on analyzing discussions from r/Canada...",
  "sources": [
    {
      "id": "post_123",
      "title": "Housing crisis discussion",
      "text": "Post content...",
      "score": 1250,
      "year": "2023",
      "num_comments": 342
    }
  ],
  "metadata": {
    "num_sources": 20,
    "cached": false,
    "api_calls_used": 4,
    "api_calls_remaining": 999999
  }
}

2. Health Check

GET /health

Check service status.

Response:

{
  "status": "healthy",
  "service": "CanadaConvo",
  "version": "1.0.0"
}

3. Statistics

GET /stats

Get system statistics.

Response:

{
  "total_posts": 2972749,
  "api_calls_used": 4,
  "api_calls_remaining": 999999,
  "embedding_model": "all-MiniLM-L6-v2",
  "llm_provider": "Groq",
  "llm_model": "llama-3.3-70b-versatile",
  "using_groq": true
}

📁 Project Structure

Verity/
├── canadaconvo-backend/          # Python FastAPI Backend
│   ├── app/
│   │   ├── api/                  # API endpoints
│   │   │   └── v1/
│   │   │       └── endpoints/
│   │   │           ├── query.py      # POST /query
│   │   │           └── health.py     # GET /health, /stats
│   │   ├── core/                 # Core business logic
│   │   │   └── rag_engine.py        # Main RAG implementation
│   │   ├── db/                   # Database clients
│   │   │   └── chroma_client.py     # ChromaDB integration
│   │   ├── models/               # Pydantic models
│   │   │   ├── query.py             # Request/Response models
│   │   │   └── post.py              # Post model
│   │   ├── services/             # External services
│   │   │   ├── embedding_service.py # Embeddings
│   │   │   ├── groq_service.py      # Groq API client
│   │   │   └── gemini_service.py    # Gemini API client
│   │   └── utils/                # Utilities
│   │       └── data_loader.py       # Data management
│   ├── config/
│   │   └── settings.py           # Configuration
│   ├── data/                     # Data files (16GB+)
│   │   ├── processed/
│   │   │   ├── embeddings.npy       # Pre-computed embeddings
│   │   │   └── metadata.parquet     # Post metadata
│   │   └── chroma/               # ChromaDB persistent storage
│   ├── scripts/                  # Data processing scripts
│   │   ├── 02_clean_data.py
│   │   ├── 03_generate_embeddings.py
│   │   └── 04_build_indexes.py
│   ├── .env                      # Environment variables (gitignored)
│   ├── .env.example              # Template for env vars
│   ├── Dockerfile                # Docker configuration
│   ├── requirements.txt          # Python dependencies
│   ├── app.py                    # Server entry point
│   └── README.md                 # Backend documentation
│
├── canadaconvo-frontend/         # React + TypeScript Frontend
│   ├── src/
│   │   ├── components/
│   │   │   ├── ChatInterface.tsx    # Main chat UI
│   │   │   ├── MessageDisplay.tsx   # Message rendering
│   │   │   ├── SourceCard.tsx       # Citation cards
│   │   │   └── StatsPanel.tsx       # Header stats
│   │   ├── services/
│   │   │   └── api.ts               # API client
│   │   ├── types/
│   │   │   └── index.ts             # TypeScript types
│   │   ├── App.tsx                  # Root component
│   │   ├── main.tsx                 # Entry point
│   │   └── index.css                # Global styles
│   ├── public/                   # Static assets
│   ├── package.json
│   ├── tsconfig.json
│   ├── vite.config.ts            # Vite configuration
│   └── tailwind.config.js        # Tailwind config
│
├── .gitignore                    # Git ignore rules
└── README.md                     # This file

🔄 How It Works

End-to-End Flow

1. USER INPUT
   └─ User types: "What issues concern Canadians most?"

2. FRONTEND
   └─ POST /api/v1/query

3. BACKEND API
   └─ Receives QueryRequest

4. RAG ENGINE
   ├─ Embedding Generation
   │  └─ Convert query to 384-dim vector
   │
   ├─ Vector Search
   │  └─ Find top 20 similar posts in ChromaDB
   │
   ├─ Context Building
   │  └─ Structure prompt with retrieved posts
   │
   ├─ LLM Generation
   │  └─ Groq API generates insights
   │
   └─ Response Formatting
      └─ Return {answer, sources, metadata}

5. FRONTEND DISPLAY
   ├─ Render AI answer
   ├─ Show source cards
   └─ Display metadata

6. USER SEES RESULT
   └─ Comprehensive answer with citations

Performance Metrics

Metric	Value
Dataset Size	2.97M posts
Embedding Dimension	384
First Query	~3-5s (with Groq)
Cached Query	<1s
Vector Search	~100ms
LLM Inference	~2-3s (Groq)
Default Top-K	20 posts

🎨 Features in Detail

1. Semantic Search

Uses sentence-transformers to convert text into 384-dimensional vectors, enabling semantic understanding beyond keyword matching.

2. Groq-Powered Inference

Leverages Groq's ultra-fast LLM infrastructure for sub-second response generation with llama-3.3-70b-versatile.

3. Source Attribution

Every answer includes citations to original Reddit posts, ensuring transparency and verifiability.

4. Smart Caching

Frequently asked questions are cached to provide instant responses and reduce API costs.

5. Real-time Statistics

Live dashboard showing:

Total posts in dataset (2.97M)
API usage tracking
Active LLM provider (Groq/Gemini)

🚢 Deployment

Hugging Face Space (Backend)

The backend is deployed on Hugging Face Spaces using Docker:

title: Verity Backend API
emoji: 🍁
sdk: docker
app_port: 7860

Secrets Configuration:

GROQ_API_KEY - Your Groq API key
HACKATHON_API_KEY - Gemini API key (optional)
HF_TOKEN - Hugging Face token for dataset access

Frontend Deployment

The frontend can be deployed on:

Vercel (Recommended)
Netlify
GitHub Pages
Any static hosting service

Environment Variable:

VITE_API_URL=https://shams0026-canadaconvo-backend.hf.space/api/v1

🔐 Security

API Key Protection

✅ All API keys stored in environment variables
✅ .env files gitignored
✅ No hardcoded credentials in code
✅ Secrets managed via HF Space settings

CORS Configuration

Backend allows requests from:

localhost:3000 (React dev)
localhost:5173 (Vite dev)
*.vercel.app (Vercel deployments)
*.hf.space (HF Space frontend)

🤝 Contributing

We welcome contributions! Here's how:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Commit Message Convention

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes
refactor: Code refactoring
test: Test additions/changes
chore: Build process or tool changes

🐛 Troubleshooting

Backend Issues

Problem: ChromaDB collection not found

Solution: Ensure data files are downloaded
The app will auto-download on first run

Problem: Groq API errors

Solution: Check your GROQ_API_KEY in .env
Verify you have API credits remaining

Problem: Out of memory

Solution: Reduce RETRIEVAL_TOP_K in config
Or increase available RAM (8GB+ recommended)

Frontend Issues

Problem: API connection refused

Solution: Verify VITE_API_URL in .env.local
Check backend is running

Problem: Build errors

Solution: Clear node_modules and reinstall
rm -rf node_modules package-lock.json
npm install

📜 License

This project was created for the AI Hackathon in the North 2026.

Organized by

The AI Collective - Thunder Bay

🙏 Acknowledgments

The AI Collective - Thunder Bay - For organizing the AI Hackathon in the North
Hackathon Organizers - For providing API access and resources
r/Canada Community - For the rich discussion dataset
Groq - For ultra-fast LLM inference
ChromaDB - For efficient vector storage
Sentence Transformers - For high-quality embeddings
FastAPI - For the excellent Python web framework
React Team - For the amazing frontend library
Hugging Face - For hosting infrastructure

👥 Team

Built with ❤️ for the AI Hackathon in the North 2026

📧 Contact

For questions, suggestions, or feedback:

GitHub Issues: Create an issue
Repository: Shams261/Verity

🌟 Star History

If you find this project useful, please consider giving it a ⭐!

Made with 🇨🇦 by the Verity Team

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
canadaconvo-backend		canadaconvo-backend
canadaconvo-frontend		canadaconvo-frontend
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🇨🇦 Verity - Finding Truth Within the Noise

🎯 What is Verity?

🌟 Key Highlights

🚀 Live Demo

Backend API

Example Query

✨ Features

🎯 Core Capabilities

🔍 Intelligent Search

🤖 AI-Powered Analysis

⚡ Performance Optimized

🎨 Modern Interface

🏗️ Architecture

System Overview

RAG Pipeline Flow

🛠️ Tech Stack

Backend

Frontend

Data Pipeline

📊 Dataset

Reddit Canada Dataset

Statistics

🚀 Quick Start

Prerequisites

Backend Setup

Frontend Setup

🔧 Configuration

Environment Variables

Backend (.env)

Frontend (.env.local)

📚 API Documentation

Base URL

Endpoints

1. Query Endpoint

2. Health Check

3. Statistics

📁 Project Structure

🔄 How It Works

End-to-End Flow

Performance Metrics

🎨 Features in Detail

1. Semantic Search

2. Groq-Powered Inference

3. Source Attribution

4. Smart Caching

5. Real-time Statistics

🚢 Deployment

Hugging Face Space (Backend)

Frontend Deployment

🔐 Security

API Key Protection

CORS Configuration

🤝 Contributing

Commit Message Convention

🐛 Troubleshooting

Backend Issues

Frontend Issues

📜 License

Organized by

🙏 Acknowledgments

👥 Team

📧 Contact

🌟 Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages