Skip to content

RAG-based, all-in-one AI tool that allows financial institutions and regulators to navigate the complexities of overlapping regulations, contradictions and risks. Created during Junction 2025 Hackathon, won 2nd place in Bank of Finland challenge.

Notifications You must be signed in to change notification settings

behramulukir/junction-2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LEX: Legal and Exposure Expert

Production-grade RAG pipeline and UI for exploring EU legislation, national laws, and international standards. This monorepo contains the data pipeline, backend API, and a React/Vite frontend (with mock mode for running without the backend).

🔎 Overview

  • End-to-end: preprocessing → embeddings → Vertex AI Vector Search index → API → UI
  • Multilingual embeddings (text-multilingual-embedding-002, 768-dim)
  • Rich metadata (year, doc type, source, article), paragraph indices
  • Frontend can run standalone in mock mode (no cloud credits required)

🚀 Quick Start

Frontend

UI to interact with the backend and explore the legislations in multiple ways.

cd frontend
npm install
VITE_USE_MOCK=true npm run dev

Open http://localhost:3000 and use the left sidebar to browse categories and subcategories. If you see a white screen, open DevTools and check the Console. To surface runtime errors, wrap App with an error boundary (already included in the codebase). If you later connect to an API, set VITE_API_URL.

Backend API

For this part you need to have Google Cloud access and credits.

# Python environment (macOS)
python -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt

# Start FastAPI (choose one)
python backend/api_server.py
# or
uvicorn backend.api_server:app --host 0.0.0.0 --port 8000

# Frontend → point to API
cd frontend
export VITE_API_URL=http://localhost:8000
npm run dev

Notes:

  • The backend is a FastAPI server that exposes endpoints for regulations and analysis.
  • CORS is permissive by default for local development.

Data & Pipeline (local run)

The pipeline processes documents, generates embeddings, and builds a Vertex AI Vector Search index.

# Install core requirements
pip install -r requirements.txt

# Preprocess documents (local, no upload)
python scripts/preprocessing/preprocess_local.py \
  --config config.yaml \
  --skip-upload

# Generate embeddings
python scripts/embeddings/generate_embeddings.py \
  --input-prefix processed_chunks/ \
  --output-prefix embeddings_vertexai/

# Build index (requires GCP setup)
python scripts/deployment/build_vector_index.py \
  --embeddings-prefix embeddings_vertexai/ \
  --index-display-name eu-legislation-index

Pipeline summary:

  • Preprocessing produces chunked text with paragraph indices.
  • Embeddings are generated in Vertex AI-compatible format (768-dim).
  • Index is built in Vertex AI Vector Search with namespaces for filtering.

✨ Features

  • Paragraph indices for precise excerpt extraction
  • Multi-source corpus: EU legislation, national laws (FI), international standards (Basel, IFRS)
  • Vertex AI Vector Search formatted embeddings with namespaces (year, doc_type, source_type)
  • UI for overlaps and contradictions visualization (with optional HTML/network views)

Performance and testing:

  • Embedding performance is optimized around 1,200 target tokens per chunk.
  • Tests validate preprocessing correctness and Vertex AI output compliance.

🔧 Configuration

Configuration essentials (YAML example):

gcp:
  bucket_name: "your-bucket"       # e.g., EU West 1
  output_prefix: "processed_chunks"

processing:
  chunk_target_tokens: 1200
  min_chunk_tokens: 400
  max_chunk_tokens: 1800
  input_directories:
    - "output"                    # EU legislation
    - "other_national_laws"       # National laws
    - "other_regulation_standards"# International standards

🧪 Testing & Validation

# Comprehensive preprocessing test
python scripts/testing/test_comprehensive.py

# Validate Vertex AI format
python scripts/testing/test_embedding_format.py

# End-to-end pipeline validation
python scripts/testing/validate_pipeline.py

📦 Repository Structure (brief)

  • frontend/frontend/README.md React/Vite UI
  • backend/backend/README.md FastAPI server endpoints
  • scripts/scripts/README.md preprocessing, embeddings, deployment, and tests
  • docs/ — user-facing reports and supplementary guides
  • deployment/ — deployment helper scripts (optional)

🧭 Project Structure

.
├── README.md
├── config.yaml
├── Dockerfile
├── requirements.txt
├── QUICK_REFERENCE.md
├── CONFIG_QUICK_REF.md
│
├── backend/
│   ├── api_server.py
│   ├── cache_db.py
│   ├── rag_search.py
│   ├── Dockerfile
│   └── requirements.txt
│
├── frontend/
│   ├── index.html
│   ├── package.json
│   ├── vite.config.ts
│   └── src/
│       ├── main.tsx
│       ├── App.tsx
│       ├── api/
│       ├── components/
│       ├── data/
│       ├── styles/
│       └── types/
│
├── scripts/                            
│   ├── README.md 
│   ├── requirements.txt                     
│   │
│   ├── preprocessing/                 
│   │   ├── preprocess_local.py        
│   │   └── preprocess_and_upload.py    
│   │
│   ├── embeddings/                     
│   │   ├── generate_embeddings.py      
│   │   └── generate_embeddings_parallel.py 
│   │
│   ├── deployment/                     
│   │   ├── build_vector_index.py       
│   │   ├── deploy_quick.py             
│   │   └── check_deployment.py         
│   │
│   ├── testing/                        
│   │   ├── test_comprehensive.py       
│   │   ├── test_embedding_format.py    
│   │   ├── test_preprocessing.py       
│   │   └── validate_pipeline.py        
│   │
│   └── utilities/                      
│       ├── extract_paragraphs.py       
│       ├── rag_search.py               
│       ├── metadata_store.py           
│       └── monitor_build.sh            
│
├── deployment/
│   ├── README.md
│   ├── deploy-backend.sh
│   ├── deploy-frontend.sh
│   └── setup-deployment.sh
│
├── docs/
│   ├── QUICK_START.md
│   ├── IMPLEMENTATION_GUIDE.md
│   ├── VERTEX_AI_INTEGRATION.md
│   └── (additional reports and guides)
│
└── data/
    └── AllRiskCategories.json

📊 Current Data Status

  • Documents processed: 61,072
  • Total chunks: 334,000+ (≈919 MB)

🔐 Compliance & Regions

  • Region: EU West 1 (europe-west1)
  • Bucket: uniform access with retention policy

About

RAG-based, all-in-one AI tool that allows financial institutions and regulators to navigate the complexities of overlapping regulations, contradictions and risks. Created during Junction 2025 Hackathon, won 2nd place in Bank of Finland challenge.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •