A lightweight vector database built in Rust with persistent storage.
- Efficient vector similarity search
- Persistent JSON storage
- UUID-based document identification
- Support for associating vectors with filenames
- Simple CLI interface for common operations
- Clone the repository:
git clone https://github.com/varadanvk/arrow.git
cd arrow- Build the project:
cargo build --release- Run the executable:
./target/release/arrowThe CLI provides several commands to interact with the vector database:
-d, --database <PATH>: Specify the path to the vector store file (default:vector_store.json)-h, --help: Print help information-V, --version: Print version information
arrow create [OPTIONS]Options:
-m, --max-connections <NUM>: Maximum connections per node (default: 16)
Example:
arrow create --max-connections 32arrow add <FILES>...Example:
arrow add document1.txt document2.txtThis will:
- Read the text from each file
- Split it into chunks (max 512 characters each)
- Generate embeddings using the All-MiniLM-L6-v2 model
- Add each chunk with its embedding to the vector store
- Save the updated vector store to disk
arrow query [OPTIONS] <TEXT>Options:
-t, --top-k <NUM>: Number of results to return (default: 5)
Example:
arrow query "What is a monopoly business?" --top-k 3arrow list [OPTIONS]Options:
-l, --limit <NUM>: Maximum number of documents to list (default: 10)
Example:
arrow list --limit 20arrow infoThis displays:
- The location of the vector store
- The number of documents
- The source files
Arrow consists of two main components:
-
VectorStore: A hierarchical navigable small-world (HNSW) graph-based vector index with:
- Multiple layers for efficient navigation
- Configurable maximum connections per node
- UUID-based document identification
-
Embeddor: A text embedding module that:
- Uses Hugging Face's Rust implementation of All-MiniLM-L6-v2
- Supports chunking of long texts
- Processes embeddings in parallel for better performance
MIT
Contributions are welcome! Please feel free to submit a Pull Request.