This repository demonstrates how to integrate vector stores with LangChain for building applications that require semantic search, retrieval-augmented generation (RAG), and other LLM-powered workflows.
Currently, the code covers two vector stores:
- Chroma β a lightweight, open-source embedding database.
- FAISS β Facebook AI Similarity Search, a high-performance library for efficient similarity search and clustering of dense vectors.
Vector stores allow you to efficiently store and search embeddings. Some common applications include:
- Question Answering over large documents.
- Context-aware chatbots with memory.
- Semantic Search beyond keyword matching.
- Recommendation Systems using similarity search.
- Retrieval-Augmented Generation (RAG) to enhance LLM outputs with external knowledge.
- Open-source and lightweight.
- Simple local setup, ideal for prototypes and small to medium workloads.
- Integrates seamlessly with LangChain.
- Developed by Facebook AI Research.
- Optimized for speed and efficiency.
- Supports both CPU and GPU acceleration.
- Great for large-scale similarity search.
Even though this repository focuses on Chroma and FAISS, here are some other production-grade vector databases you may consider:
-
Weaviate
- Cloud-native, open-source, and schema-based.
- Supports hybrid search (combining keyword + vector search).
- Has built-in modules for image and text embeddings.
-
Milvus
- Open-source and highly scalable.
- Designed for billion-scale vector datasets.
- Strong community support and active development.
-
Pinecone
- Fully managed vector database as a service.
- Handles scaling, sharding, and replication automatically.
- Great for enterprise use cases where infrastructure is managed.
-
Qdrant
- Open-source, written in Rust.
- Provides high performance with a modern API.
- Supports filters and hybrid search.