A full-stack application that allows users to upload PDFs, automatically parse and embed them into a PostgreSQL vector database, and chat with an AI assistant about the PDF contents using semantic search.
- π PDF Upload & Processing: Upload PDFs with automatic text extraction and chunking
- π€ AI-Powered Chat: Chat with an intelligent assistant about your PDF contents
- π Semantic Search: Find relevant information using vector similarity search
- π¬ Chat History: Persistent conversation history for each PDF
- β‘ Streaming Responses: Real-time AI responses for better user experience
- π Performance Monitoring: Built-in timing and performance logging
- Backend: Node.js + Express + TypeScript + Prisma
- Frontend: React 18 + TypeScript + Redux Toolkit + Tailwind CSS
- Database: PostgreSQL with pgvector extension for vector operations
- AI Services: OpenAI API for embeddings and chat completions
- Vector Search: HNSW indexing for efficient similarity search
- Node.js 18+
- PostgreSQL 13+ with pgvector extension
- OpenAI API key
git clone https://github.com/codefa/pdf-chat-vector.git
cd pdf-chat-vectorInstall PostgreSQL and the pgvector extension:
# Ubuntu/Debian
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install postgresql-13-pgvector
# macOS with Homebrew
brew install postgresql
brew install pgvectorCreate a database and enable the pgvector extension:
CREATE DATABASE pdf_chat;
\c pdf_chat
CREATE EXTENSION vector;Create a .env file backend/:
DATABASE_URL="postgresql://username:password@localhost:5432/pdf_chat"
OPENAI_API_KEY="your-openai-api-key-here"
PORT=5000# Backend
cd backend
npm install
# Frontend (in another terminal)
cd frontend
npm installcd backend
npx prisma migrate dev
npx prisma generate# Backend (Terminal 1)
cd backend
npm run dev
# Frontend (Terminal 2)
cd frontend
npm run devThe application will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:5000
- Navigate to the home page
- Enter a title for your PDF
- Select a PDF file
- Click "Upload" to process the document
- Go to the PDFs list page
- Click "Chat" on any uploaded PDF
- Ask questions about the document content
- Get AI-powered responses based on semantic search
- View all uploaded PDFs
- Delete PDFs and related data
- Access chat history for each document
POST /api/pdf/upload- Upload and process PDFGET /api/pdf/list- List all PDFsDELETE /api/pdf/:id- Delete PDF and related data
POST /api/chat/:pdfId- Start/continue chat with streamingGET /api/chat/history/:chatId- Get chat historyGET /api/chat/bench/:pdfId- Performance benchmarking
-- PDF metadata
PDF (id, title, filename, createdAt)
-- Vector embeddings for text chunks
Embeddings (id, pdfId, chunk, embedding)
-- Chat sessions
Chat (id, pdfId, createdAt)
-- Conversation messages
Message (id, chatId, role, content, createdAt)- Runtime: Node.js with TypeScript
- Framework: Express.js
- Database: PostgreSQL + pgvector
- ORM: Prisma
- AI: OpenAI API
- File Processing: pdf-parse
- Framework: React 18 with TypeScript
- State Management: Redux Toolkit
- Routing: React Router DOM
- Styling: Tailwind CSS
- Build Tool: Vite
The application uses optimized HNSW indexing with configurable parameters:
// In embeddings service
await tx.$executeRawUnsafe('SET LOCAL diskann.query_rescore = 400')
await tx.$executeRawUnsafe('SET LOCAL diskann.query_search_list_size = 100')
await tx.$executeRawUnsafe('SET LOCAL hnsw.ef_search = 50')- Embeddings:
text-embedding-3-small - Chat:
gpt-4o-mini - Temperature: 0.2
- Max Tokens: 256
- Chunking Strategy: ~1000 character chunks for optimal embedding
- Top-K Retrieval: Retrieves top 3 most relevant chunks
- Streaming Responses: Real-time AI responses
- Performance Logging: Built-in timing metrics
# Backend Dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 5000
CMD ["npm", "start"]NODE_ENV=production
DATABASE_URL="your-production-database-url"
OPENAI_API_KEY="your-production-openai-key"
PORT=5000- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow TypeScript best practices
- Maintain RESTful API design principles
- Add tests for new features
- Update documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for AI services
- pgvector for PostgreSQL vector operations
- Prisma for database management
- Tailwind CSS for styling
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Include your environment details and error logs
Made with β€οΈ for developers who love AI and vector search!