Vectra is a file‑backed, in‑memory vector database for Node.js. It works like a local Pinecone Qdrant: each index is just a folder on disk with an index.json file containing vectors and any metadata fields you choose to index; all other metadata is stored per‑item as separate JSON files. Queries use a Pinecone‑compatible subset of MongoDB‑style operators for filtering, then rank matches by cosine similarity. Because the entire index is loaded into memory, lookups are extremely fast (often <1 ms for small indexes, commonly 1–2 ms for larger local sets). It’s ideal when you want simple, zero‑infrastructure retrieval over a small, mostly static corpus. Pinecone‑style namespaces aren’t built‑in, but you can mimic them by using separate folders (indexes).
Typical use cases:
- Prompt augmentation over a small, mostly static corpus
- Infinite few‑shot example libraries
- Single‑document or small multi‑document Q&A
- Local/dev workflows where hosted vector DBs are overkill
Table of contents
- Why Vectra
- When to use (and when not)
- Requirements
- Install
- Quick Start
- CLI in 60 seconds
- Core concepts
- File-backed vs in-memory usage
- Best practices
- Performance and limits
- Troubleshooting (quick)
- Next steps
- License
- Project links
- Zero infrastructure: everything lives in a local folder; no servers, clusters, or managed services required.
- Predictable local performance: full in‑memory scans with pre‑normalized cosine similarity deliver sub‑millisecond to low‑millisecond latency for small/medium corpora.
- Simple mental model: one folder per index; index.json holds vectors and indexed fields, while non‑indexed metadata is stored as per‑item JSON.
- Easy portability: because the format is file‑based and language‑agnostic, indexes can be written in one language and read in another.
- Pinecone‑style filtering: use a familiar subset of MongoDB query operators to filter by metadata before similarity ranking.
- Great for prompt engineering: quickly assemble and retrieve few‑shot examples or small static corpora without external dependencies.
Use Vectra when:
- You have a small, mostly static corpus (e.g., a few hundred to a few thousand chunks).
- You want zero‑infrastructure local retrieval with fast, predictable latency.
- You’re assembling “infinite few‑shot” example libraries or single/small document Q&A.
- You need portable, file‑based indexes that other languages can read/write.
- You want simple “namespaces” by using separate folders per dataset.
Avoid Vectra when:
- You need long‑term, ever‑growing chat memory or very large corpora (the entire index loads into RAM).
- You require multi‑tenant, networked, or horizontally scalable serving.
- You need advanced vector DB features like HNSW/IVF indexing, sharding/replication, or distributed operations.
Notes and tips:
- Mimic namespaces via separate index folders.
- Index only the metadata fields you’ll filter on; keep everything else in per‑item JSON.
- Rough sizing: a 1536‑dim float32 vector is ~6 KB, plus JSON/metadata overhead; size indexes accordingly to your RAM budget.
- Node.js 20.x or newer
- A package manager (npm or yarn)
- An embeddings provider for similarity search:
- OpenAI (API key + model, e.g., text-embedding-3-large or compatible)
- Azure OpenAI (endpoint, deployment name, API key)
- OpenAI‑compatible OSS endpoint (model name + base URL)
- If you plan to ingest web pages via the CLI or API, outbound network access to those URLs
- Sufficient RAM to hold your entire index in memory during queries (see “Performance and limits”)
- npm:
npm install vectra - yarn:
yarn add vectra
CLI usage
- Run without installing globally:
npx vectra --help - Optional global install:
npm install -g vectra(then usevectra --help)
Two common paths:
- Path A: you already have vectors (or can generate them) and want to store items + metadata.
- Path B: you have raw text documents; Vectra will chunk, embed, and retrieve relevant spans.
- Create a folder‑backed index
- Choose which metadata fields to index (others are stored per‑item on disk)
- Insert items (vector + metadata)
- Query by vector with optional metadata filters
TypeScript example:
import path from 'node:path';
import { LocalIndex } from 'vectra';
import { OpenAI } from 'openai';
// 1) Create an index folder
const index = new LocalIndex(path.join(process.cwd(), 'my-index'));
// 2) Create the index (set which metadata fields you want searchable)
if (!(await index.isIndexCreated())) {
await index.createIndex({
version: 1,
metadata_config: { indexed: ['category'] }, // only these fields live in index.json; others go to per-item JSON
});
}
// 3) Prepare an embeddings helper (use any provider you like)
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
async function getVector(text: string): Promise<number[]> {
const resp = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return resp.data[0].embedding;
}
// 4) Insert items
await index.insertItem({
vector: await getVector('apple'),
metadata: { text: 'apple', category: 'food', note: 'stored on disk if not indexed' },
});
await index.insertItem({
vector: await getVector('blue'),
metadata: { text: 'blue', category: 'color' },
});
// 5) Query by vector, optionally filter by metadata
async function query(text: string) {
const v = await getVector(text);
// Signature: queryItems(vector, queryString, topK, filter?)
const results = await index.queryItems(v, '', 3, { category: { $eq: 'food' } });
for (const r of results) {
console.log(r.score.toFixed(4), r.item.metadata.text);
}
}
await query('banana'); // should surface 'apple' in top resultsSupported filter operators (subset): $and, $or, $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin. Only fields listed in metadata_config.indexed are stored inline and should be used for filtering (everything else is kept per‑item on disk).
- Create a document index backed by an embeddings model
- Add documents (raw strings, files, or URLs)
- Query by text; Vectra returns the most relevant chunks grouped by document
- Render top sections for direct drop‑in to prompts
- Optional hybrid retrieval: add BM25 keyword matches alongside semantic matches
TypeScript example:
import path from 'node:path';
import { LocalDocumentIndex, OpenAIEmbeddings } from 'vectra';
// 1) Configure embeddings (OpenAI, Azure OpenAI, or OpenAI‑compatible OSS)
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY!,
model: 'text-embedding-3-small',
maxTokens: 8000, // batching limit for chunked requests
});
// 2) Create the index
const docs = new LocalDocumentIndex({
folderPath: path.join(process.cwd(), 'my-doc-index'),
embeddings,
// optional: customize chunking
// chunkingConfig: { chunkSize: 512, chunkOverlap: 0, keepSeparators: true }
});
if (!(await docs.isIndexCreated())) {
await docs.createIndex({ version: 1 });
}
// 3) Add a document (string); you can also add files/URLs via FileFetcher/WebFetcher or the CLI
const uri = 'doc://welcome';
const text = `
Vectra is a file-backed, in-memory vector DB for Node.js. It supports Pinecone-like metadata filtering
and fast local retrieval. It’s ideal for small, mostly static corpora and prompt augmentation.
`;
await docs.upsertDocument(uri, text, 'md'); // optional docType hints chunking
// 4) Query and render sections for your prompt
const results = await docs.queryDocuments('What is Vectra best suited for?', {
maxDocuments: 5,
maxChunks: 20,
// isBm25: true, // turn on hybrid (semantic + keyword) retrieval
});
// Take top document and render spans of text
if (results.length > 0) {
const top = results[0];
console.log('URI:', top.uri, 'score:', top.score.toFixed(4));
const sections = await top.renderSections(2000, 1, true); // maxTokens per section, number of sections
for (const s of sections) {
console.log('Section score:', s.score.toFixed(4), 'tokens:', s.tokenCount, 'bm25:', s.isBm25);
console.log(s.text);
}
}Notes:
- queryDocuments returns LocalDocumentResult objects, each with scored chunks. renderSections merges adjacent chunks, keeps within your token budget, and can optionally add overlapping context for readability.
- Hybrid retrieval: set isBm25: true in queryDocuments to include keyword‑based chunks (Okapi‑BM25) alongside semantic chunks. Each rendered section includes isBm25 to help you distinguish them.
Three steps: create → add → query. No servers, just a folder.
- Create an index folder
npx vectra create ./my-doc-index
# or, after global install:
# vectra create ./my-doc-index- Add documents (URLs or local files)
- Prepare a keys.json for your embeddings provider.
OpenAI example:
{
"apiKey": "sk-...",
"model": "text-embedding-3-small",
"maxTokens": 8000
}Azure OpenAI example:
{
"azureApiKey": "xxxxx",
"azureEndpoint": "https://your-resource-name.openai.azure.com",
"azureDeployment": "your-embedding-deployment",
"azureApiVersion": "2023-05-15",
"maxTokens": 8000
}OpenAI‑compatible OSS example:
{
"ossModel": "text-embedding-3-small",
"ossEndpoint": "https://your-oss-endpoint.example.com",
"maxTokens": 8000
}Add a single URL:
npx vectra add ./my-doc-index --keys ./keys.json --uri https://example.com/pageAdd multiple URLs or files:
# multiple --uri flags
npx vectra add ./my-doc-index --keys ./keys.json \
--uri https://example.com/page1 \
--uri https://example.com/page2 \
--uri ./local-docs/guide.md
# from a list file (one URL or file path per line)
npx vectra add ./my-doc-index --keys ./keys.json --list ./uris.txtUseful flags:
- --cookie "" to pass auth/session cookies for web pages
- --chunk-size 512 to adjust chunking during ingestion
- Query the index
# Basic query: returns top documents and renders top sections
npx vectra query ./my-doc-index "What is Vectra best suited for?" --keys ./keys.jsonTune output:
# return up to 3 documents, render 1 section with up to 1200 tokens
npx vectra query ./my-doc-index "hybrid retrieval" \
--keys ./keys.json \
--document-count 3 \
--chunk-count 50 \
--section-count 1 \
--tokens 1200 \
--format sections \
--overlap true \
--bm25 trueOther commands
- Remove documents by URI:
npx vectra remove ./my-doc-index --uri https://example.com/page
# or from a list file
npx vectra remove ./my-doc-index --list ./uris.txt- Print index stats:
npx vectra stats ./my-doc-index- For a full list of commands:
npx vectra --helpVectra keeps a simple, portable model: indexes live as folders on disk, but are fully loaded into memory at query time. You choose whether to work at the “item” level (you supply vectors + metadata) or the “document” level (Vectra chunks, embeds, and retrieves).
- LocalIndex
- You bring vectors and metadata.
- Configure which metadata fields to “index” (kept inline in index.json) vs store per‑item in external JSON.
- Query by vector with optional metadata filtering; results return items sorted by cosine similarity.
- LocalDocumentIndex
- You bring raw text (strings, files, or URLs).
- Vectra splits text into chunks, generates embeddings (via your configured provider), stores chunk metadata (documentId, startPos, endPos), and persists the document body to disk.
- Query by text; results are grouped per document with handy methods to render scored spans for prompts.
Both are folder‑backed and portable: any language can read/write the on‑disk format.
- Filters are evaluated before similarity ranking using a subset of MongoDB‑style operators:
- Logical: $and, $or
- Comparison: $eq, $ne, $gt, $gte, $lt, $lte
- Sets/strings: $in, $nin
- Indexed vs non‑indexed fields
- Fields listed in metadata_config.indexed are stored inline in index.json and are ideal for filtering.
- All other metadata is stored in a per‑item JSON file on disk to keep index.json small.
- Trade‑off: indexing more fields speeds filtering but increases index.json size.
- LocalDocumentIndex supports semantic retrieval by embeddings and optional keyword retrieval via Okapi‑BM25.
- Enable BM25 per query (isBm25: true) to blend in strong keyword matches alongside semantic chunks.
- Results and rendered sections flag BM25 spans so you can treat them differently in prompts if desired.
- index.json
- version, metadata_config, and an array of items (id, vector, norm, metadata, optional metadataFile).
- For documents, items are chunk entries with metadata including documentId, startPos, endPos (and optional user metadata).
- Per‑item metadata (.json)
- When you choose not to index some fields, full metadata is stored in a separate JSON file (referenced by metadataFile).
- Documents
- Each document body is saved as .txt.
- Optional document‑level metadata is saved as .json.
- A catalog.json maps uri ↔ id and tracks counts (portable and easy to inspect/version).
Vectra uses a single, consistent model: indexes persist as files/folders on disk, but are fully loaded into memory for filtering and similarity ranking.
-
Persistent usage
- Choose a stable folder and reuse it across runs.
- Create the index once, then upsert/insert items or documents incrementally.
- Example: ./my-doc-index checked into your project or stored on a local volume.
-
Ephemeral usage
- Use a temporary directory per run and rebuild from source content.
- Useful for CI, notebooks, or demos where rebuild cost is low and determinism is desirable.
- Tip: pass deleteIfExists: true on createIndex to reset quickly.
Example:
import path from 'node:path';
import os from 'node:os';
import { LocalIndex } from 'vectra';
const folderPath = process.env.PERSISTENT_INDEX_DIR
? process.env.PERSISTENT_INDEX_DIR
: path.join(os.tmpdir(), 'vectra-ephemeral');
const index = new LocalIndex(folderPath);
await index.createIndex({
version: 1,
deleteIfExists: !process.env.PERSISTENT_INDEX_DIR, // reset if ephemeral
metadata_config: { indexed: ['category'] },
});-
Index only what you filter on
- Put frequently used filter fields in metadata_config.indexed to keep index.json small but filterable.
- Store everything else in per‑item JSON (automatically handled).
-
Use separate folders as namespaces
- Mimic Pinecone namespaces by creating one index folder per dataset or tenant.
-
Batch writes when possible
- Prefer batchInsertItems for item‑level bulk adds; it applies all‑or‑nothing and avoids partial updates.
- For document flow, add/remove documents via upsertDocument/deleteDocument which wrap begin/end updates for you.
-
Respect the update lock
- If you manage updates manually, call beginUpdate → multiple insert/delete → endUpdate.
- Avoid overlapping updates; calling beginUpdate twice throws.
-
Choose chunk sizes sensibly (documents)
- Default chunkSize 512 tokens with 0 overlap is a good starting point.
- If queries are long or context is important, consider modest overlap; keep chunkSize under your embedding provider’s maxTokens per request batch.
- KeepSeparators: true preserves natural boundaries for better section rendering.
-
Tune retrieval to your data
- For exact phrases or code terms, enable hybrid retrieval (isBm25: true) to add keyword matches to semantic results.
- Render sections with a realistic token budget for your target LLM; 1000–2000 tokens per section is common.
-
Keep vectors consistent
- Use the same embedding model/dimensions across an index.
- Re‑embed and rebuild if you change models.
-
Be mindful of memory
- The entire index is loaded into RAM; estimate vector + metadata size and stay within budget.
- Consider multiple smaller indexes instead of one giant index if you have distinct corpora.
-
How it searches
- Linear scan over all items with cosine similarity; vectors are pre‑normalized and each item caches its norm.
- Results are sorted by similarity and truncated to topK.
- Hybrid mode (documents) optionally adds BM25 keyword matches after the semantic pass.
-
Typical latency
- Small indexes: often <1 ms per query.
- Medium local corpora: commonly 1–2 ms; depends on CPU, vector dimensionality, and metadata filtering cost.
- BM25 adds a small overhead proportional to the number of non‑selected chunks it evaluates.
-
Memory model
- Entire index is loaded into RAM for querying.
- Rule‑of‑thumb sizing per vector (Node.js in‑memory):
- number[] uses ~8 bytes per element (JS double) + array/object overhead.
- Example: 1536‑dim vector ≈ ~12 KB for raw numbers, plus per‑item metadata/object overhead.
- On disk, JSON is larger than binary; index.json contains vectors and indexed metadata, while non‑indexed metadata is stored per‑item as separate JSON files.
-
Choose dimensions and fields wisely
- Use the smallest embedding dimensionality that meets quality requirements.
- Index only fields you actually filter on to keep index.json smaller and reduce load/parse time.
-
Limits and cautions
- Not intended for large, ever‑growing chat memories or multi‑million‑item corpora.
- Very large indexes mean high RAM usage and longer JSON (de)serialization times at startup.
- Sorting all distances is O(n log n); keep n within practical bounds for your machine.
- Embedding generation is external to Vectra; rate limits and throughput depend on your provider and model.
- Web ingestion depends on site availability/format; use --cookie if needed and respect robots/terms.
-
Missing/invalid embeddings config
- Symptom: “Embeddings model not configured.” or provider errors.
- Fix: For code, pass an OpenAIEmbeddings instance. For CLI, supply a valid keys.json:
- OpenAI: { "apiKey": "...", "model": "text-embedding-3-small", "maxTokens": 8000 }
- Azure OpenAI: { "azureApiKey": "...", "azureEndpoint": "https://...", "azureDeployment": "...", "azureApiVersion": "2023-05-15" }
- OSS: { "ossModel": "text-embedding-3-small", "ossEndpoint": "https://..." }
-
Rate limits/timeouts when embedding
- Symptom: “rate_limited” or provider errors.
- Fix: Reduce batch size (chunkSize), add delay/retries (OpenAIEmbeddings has retryPolicy), or upgrade your plan.
-
Index already exists
- Symptom: “Index already exists”.
- Fix: Pass deleteIfExists: true to createIndex, or call deleteIndex first.
-
Index not found
- Symptom: “Index does not exist”.
- Fix: Call isIndexCreated() and createIndex() before using the index.
-
Update lock misuse
- Symptom: “Update already in progress” (double begin) or “No update in progress” (end without begin).
- Fix: Pair beginUpdate → insert/delete → endUpdate. Prefer batchInsertItems or helper methods (upsertDocument) to avoid manual locking.
-
Filters return no results
- Symptom: Expected items aren’t matched by metadata filter.
- Fix: Only fields listed in metadata_config.indexed are filterable inline. Ensure the field is included at index creation and that your operators/values ($eq, $in, etc.) match actual data types.
-
Dimension mismatch or NaNs
- Symptom: Weird scores or NaN.
- Fix: Keep a single embedding model/dimension per index; re‑embed and rebuild if you change models.
-
Node/environment issues
- Symptom: Runtime errors on fs or syntax.
- Fix: Use Node 20.x+, verify file permissions and paths. For local storage, ensure the target folder exists/permissions allow write.
-
Corrupt/invalid JSON on disk
- Symptom: JSON parse errors reading index.json or metadata files.
- Fix: Recreate the index (deleteIfExists: true) and re‑ingest, or restore from a clean copy.
-
Web fetching problems (CLI)
- Symptom: “invalid content type” or 4xx/5xx.
- Fix: Use --cookie for authenticated pages; ensure URL is reachable and returns text/html or other allowed types.
-
BM25 returns nothing
- Symptom: No keyword chunks added.
- Fix: Ensure isBm25: true at query time and a non‑empty query string. Remember only topK BM25 results are blended in after semantic selection.
- Explore the APIs
- LocalIndex: item‑level vectors + metadata (createIndex, insertItem, batchInsertItems, queryItems, listItemsByMetadata)
- LocalDocumentIndex: document ingestion + chunking + retrieval (upsertDocument, queryDocuments, listDocuments, renderSections)
- OpenAIEmbeddings: OpenAI/Azure/OSS embeddings helper (createEmbeddings, retryPolicy, maxTokens)
- Utilities: TextSplitter, FileFetcher, WebFetcher, storage backends (LocalFileStorage, VirtualFileStorage)
- Revisit the Quick Start
- Path A (items): see section 6.1
- Path B (documents): see section 6.2
- CLI reference
- npx vectra --help
- Create, add, query, stats, remove (see section 7 for examples)
- Other language bindings
- Python: vectra-py — https://github.com/BMS-geodev/vectra-py
- Get involved
- Issues and feature requests: https://github.com/Stevenic/vectra/issues
- Contributing guide: ./CONTRIBUTING.md
- Code of Conduct: ./CODE_OF_CONDUCT.md
Vectra is open‑source software licensed under the MIT License.
- Full text: LICENSE
- Contributions: By submitting a contribution, you agree it will be licensed under the MIT License. See CONTRIBUTING
- Community standards: Please review our CODE_OF_CONDUCT