Skip to content

A full-stack application that allows users to upload PDFs, automatically parse and embed them into a PostgreSQL vector database, and chat with an AI assistant about the PDF contents using semantic search.

Notifications You must be signed in to change notification settings

Codefa/pdf-chat-vector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š PDF Chat with Vector Search

A full-stack application that allows users to upload PDFs, automatically parse and embed them into a PostgreSQL vector database, and chat with an AI assistant about the PDF contents using semantic search.

✨ Features

  • πŸ“„ PDF Upload & Processing: Upload PDFs with automatic text extraction and chunking
  • πŸ€– AI-Powered Chat: Chat with an intelligent assistant about your PDF contents
  • πŸ” Semantic Search: Find relevant information using vector similarity search
  • πŸ’¬ Chat History: Persistent conversation history for each PDF
  • ⚑ Streaming Responses: Real-time AI responses for better user experience
  • πŸ“Š Performance Monitoring: Built-in timing and performance logging

πŸ—οΈ Architecture

  • Backend: Node.js + Express + TypeScript + Prisma
  • Frontend: React 18 + TypeScript + Redux Toolkit + Tailwind CSS
  • Database: PostgreSQL with pgvector extension for vector operations
  • AI Services: OpenAI API for embeddings and chat completions
  • Vector Search: HNSW indexing for efficient similarity search

πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • PostgreSQL 13+ with pgvector extension
  • OpenAI API key

1. Clone the Repository

git clone https://github.com/codefa/pdf-chat-vector.git
cd pdf-chat-vector

2. Set Up Database

Install PostgreSQL and the pgvector extension:

# Ubuntu/Debian
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install postgresql-13-pgvector

# macOS with Homebrew
brew install postgresql
brew install pgvector

Create a database and enable the pgvector extension:

CREATE DATABASE pdf_chat;
\c pdf_chat
CREATE EXTENSION vector;

3. Environment Setup

Create a .env file backend/:

DATABASE_URL="postgresql://username:password@localhost:5432/pdf_chat"
OPENAI_API_KEY="your-openai-api-key-here"
PORT=5000

4. Install Dependencies

# Backend
cd backend
npm install

# Frontend (in another terminal)
cd frontend
npm install

5. Database Migration

cd backend
npx prisma migrate dev
npx prisma generate

6. Start the Application

# Backend (Terminal 1)
cd backend
npm run dev

# Frontend (Terminal 2)
cd frontend
npm run dev

The application will be available at:

πŸ“– Usage

1. Upload a PDF

  • Navigate to the home page
  • Enter a title for your PDF
  • Select a PDF file
  • Click "Upload" to process the document

2. Chat with Your PDF

  • Go to the PDFs list page
  • Click "Chat" on any uploaded PDF
  • Ask questions about the document content
  • Get AI-powered responses based on semantic search

3. Manage PDFs

  • View all uploaded PDFs
  • Delete PDFs and related data
  • Access chat history for each document

πŸ—„οΈ API Endpoints

PDF Routes

  • POST /api/pdf/upload - Upload and process PDF
  • GET /api/pdf/list - List all PDFs
  • DELETE /api/pdf/:id - Delete PDF and related data

Chat Routes

  • POST /api/chat/:pdfId - Start/continue chat with streaming
  • GET /api/chat/history/:chatId - Get chat history
  • GET /api/chat/bench/:pdfId - Performance benchmarking

πŸ—„οΈ Database Schema

-- PDF metadata
PDF (id, title, filename, createdAt)

-- Vector embeddings for text chunks
Embeddings (id, pdfId, chunk, embedding)

-- Chat sessions
Chat (id, pdfId, createdAt)

-- Conversation messages
Message (id, chatId, role, content, createdAt)

πŸ› οΈ Technology Stack

Backend

  • Runtime: Node.js with TypeScript
  • Framework: Express.js
  • Database: PostgreSQL + pgvector
  • ORM: Prisma
  • AI: OpenAI API
  • File Processing: pdf-parse

Frontend

  • Framework: React 18 with TypeScript
  • State Management: Redux Toolkit
  • Routing: React Router DOM
  • Styling: Tailwind CSS
  • Build Tool: Vite

βš™οΈ Configuration

Vector Search Parameters

The application uses optimized HNSW indexing with configurable parameters:

// In embeddings service
await tx.$executeRawUnsafe('SET LOCAL diskann.query_rescore = 400')
await tx.$executeRawUnsafe('SET LOCAL diskann.query_search_list_size = 100')
await tx.$executeRawUnsafe('SET LOCAL hnsw.ef_search = 50')

AI Model Configuration

  • Embeddings: text-embedding-3-small
  • Chat: gpt-4o-mini
  • Temperature: 0.2
  • Max Tokens: 256

πŸ“Š Performance Features

  • Chunking Strategy: ~1000 character chunks for optimal embedding
  • Top-K Retrieval: Retrieves top 3 most relevant chunks
  • Streaming Responses: Real-time AI responses
  • Performance Logging: Built-in timing metrics

πŸš€ Deployment

Docker (Recommended)

# Backend Dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 5000
CMD ["npm", "start"]

Environment Variables for Production

NODE_ENV=production
DATABASE_URL="your-production-database-url"
OPENAI_API_KEY="your-production-openai-key"
PORT=5000

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow TypeScript best practices
  • Maintain RESTful API design principles
  • Add tests for new features
  • Update documentation as needed

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

οΏ½οΏ½ Acknowledgments

πŸ“ž Support

If you encounter any issues or have questions:

  1. Check the Issues page
  2. Create a new issue with detailed information
  3. Include your environment details and error logs

Made with ❀️ for developers who love AI and vector search!

License: MIT Node.js PostgreSQL React

About

A full-stack application that allows users to upload PDFs, automatically parse and embed them into a PostgreSQL vector database, and chat with an AI assistant about the PDF contents using semantic search.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published