CVBot Backend

A sophisticated AI-powered personal assistant backend designed to provide seamless access to your professional profile through intelligent document retrieval, context-aware responses, and proactive suggestions.

🚀 Features

Intelligent Agent (LangGraph)
Retrieves relevant information from your documents and alerts you when knowledge gaps are detected.
Suggestions Engine
Generates follow-up or suggested questions to guide more effective interactions.
Document Processing
Advanced OCR and text pipeline supporting multiple file formats with smart chunking.
Vector Search
Efficient semantic retrieval with ChromaDB.
Notifications
Automatic email alerts sent via SMTP when missing or incomplete information is found.
File Management
Full CRUD support for documents in the vector database.
API Tracing & Monitoring
Powered by Opik for observability and debugging.

🛠️ Tech Stack

Framework: FastAPI
Agent: LangGraph
Database: PostgreSQL + ChromaDB (vector storage)
Cache / Rate Limiting: Redis
Authentication: Clerk
OCR: Docling + EasyOCR
Monitoring: Opik
Language: Python
Email Notifications: SMTP

📋 Architecture

Document Processing Pipeline

File Upload – PDF files processed with Docling for OCR extraction.
Markdown Conversion – Unified content representation.
Smart Chunking – Content split by headings and intelligently chunked with an LLM.
Context Enhancement – Each chunk enriched with surrounding context.
Vector Embedding – Enhanced chunks stored in ChromaDB for semantic retrieval.

Agent Workflow

Query Processing – User queries handled by the LangGraph agent.
Vector Search – Relevant embeddings retrieved from ChromaDB.
Response Generation – Professional, contextually accurate responses produced.
Suggestions Node – Asynchronous node that generates question suggestions to improve user interaction.
Notification System – Missing information triggers email notifications via SMTP.

flowchart TD
    U[User Query] --> Q[LangGraph Agent]
    Q --> R[Vector Search in ChromaDB]
    R --> S[Response Generation]
    S --> T[User Response]
    Q --> SG[Suggestions Node]
    SG --> T
    Q --> N[Missing Info Detected]
    N --> EML[Email Notification via SMTP]

🛤️ Project Routes

The CVBot Backend exposes modular routes under /api/v1:

/vector-store – Manages documents in the vector DB.
- POST /vector-store/store-files: Upload & process documents.
- POST /vector-store/semantic-search: Perform semantic search.
- GET /vector-store/retrieve-files: List processed files.
- GET /vector-store/retrieve-embeddings?filename=...: Retrieve embeddings for a file.
- DELETE /vector-store/delete-files: Remove files.
/projects – Manage portfolio projects.
- GET /projects: List all projects.
- POST /projects: Add new project.
- DELETE /projects/{project_id}: Delete project by ID.
/chatbot – AI chatbot interaction.
- POST /chatbot/invoke: Send a message to the AI agent. (Rate limited: 20 req/min per user)
- GET /chatbot/history?session_id=...: Retrieve session history.

vector-store and projects routes are secured with Clerk authentication.

📊 Monitoring

Tracing & monitoring with Opik for reliable performance and debugging.

🔍 Document Processing Details

Supported File Formats

PDF: Processed with Docling (OCR support).
Markdown: Directly processed.

Chunking Strategy (Inspired by Anthropic's Contextual Retrieval)

Heading-based Split – Structural splitting.
LLM-powered Chunking – Optimized chunk sizes.
Context Addition – Enriched with surrounding text.
Semantic Embedding – Converted to vectors for similarity search.

🤖 Agent Capabilities

The LangGraph-powered agent provides:

Semantic search through professional documents.
Contextual answers about experience & skills.
Identification of knowledge gaps with SMTP email alerts.
Conversation context maintenance for seamless dialogue.
Professional, tailored responses.
Suggested follow-up questions for improved engagement.

🔐 Security

Authentication: Clerk-based authentication.
API Security: Redis-based rate limiting & strict input validation.

💻 Installation

Navigate to the deployment folder.
Copy .env.example → .env and configure environment variables (incl. Redis & SMTP).
Run docker compose up to start services (includes Redis).
Containers initialize; NGINX exposes port 80 → host 8081 (configurable).
Access via localhost:8081 or a reverse proxy.

👨‍💻 Author

Originally built by Kaloyan Stefanov.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.idea		.idea
deployment		deployment
docs		docs
migrations/cvbot		migrations/cvbot
src		src
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVBot Backend

🚀 Features

🛠️ Tech Stack

📋 Architecture

Document Processing Pipeline

Agent Workflow

🛤️ Project Routes

📊 Monitoring

🔍 Document Processing Details

Supported File Formats

Chunking Strategy (Inspired by Anthropic's Contextual Retrieval)

🤖 Agent Capabilities

🔐 Security

💻 Installation

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CVBot Backend

🚀 Features

🛠️ Tech Stack

📋 Architecture

Document Processing Pipeline

Agent Workflow

🛤️ Project Routes

📊 Monitoring

🔍 Document Processing Details

Supported File Formats

Chunking Strategy (Inspired by Anthropic's Contextual Retrieval)

🤖 Agent Capabilities

🔐 Security

💻 Installation

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages