FAISS Vector Database – Built from Scratch

This project is a full vector database system built entirely from scratch, using:

FAISS (Facebook AI Similarity Search) as the vector index
Ollama + BGE-Large (1024-dim embeddings) as the embedding engine
PostgreSQL as the metadata store
Python for ingestion, embedding, indexing, and semantic search

The goal of this project is to design, build, and operate a production-style vector database without relying on hosted services like Pinecone, Weaviate, or pgvector extensions.
Everything here demonstrates AI infrastructure engineering from first principles.

Project Overview

This system:

Loads a dataset (in this case: 500k+ Amazon reviews) from PostgreSQL
Generates dense vector embeddings using bge-large running locally via Ollama
Builds a FAISS index for fast approximate nearest-neighbor (ANN) search
Stores FAISS index + ID mappings on disk
Performs semantic search by:
- Embedding user queries
- Querying FAISS for the top-k similar vectors
- Pulling full metadata from PostgreSQL

This architecture mirrors how modern AI systems perform RAG (Retrieval-Augmented Generation) and semantic search at scale.

Why Build a Vector Database Yourself?

Most engineers only use hosted vector databases.
This project proves a deeper level of understanding:

How embeddings work
How vectors are stored
How ANN indexes operate
How semantic search is built
How to connect vector storage to relational metadata
How retrieval systems are designed internally

By building this manually, you gain hands-on mastery of the same components used by:

OpenAI
Google DeepMind
Meta
Databricks
Snowflake
NVIDIA
Every modern RAG pipeline

This is practical AI infrastructure engineering.

🙌 Author

Ricci — Data Scientist & AI Engineer in training
Building real AI infrastructure from the ground up.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
backend		backend
frontend		frontend
1765322161039.jpg		1765322161039.jpg
Hybrid Search GUI.png		Hybrid Search GUI.png
OPTIMIZATION_ROADMAP.md		OPTIMIZATION_ROADMAP.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAISS Vector Database – Built from Scratch

Project Overview

Why Build a Vector Database Yourself?

🙌 Author

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

RicciJuaman/faiss-vector-database

Folders and files

Latest commit

History

Repository files navigation

FAISS Vector Database – Built from Scratch

Project Overview

Why Build a Vector Database Yourself?

🙌 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages