Skip to content

High-performance vector database & RAG memory layer - hybrid search, embeddings, RAPTOR trees, BM25 fusion for AI systems.

License

Notifications You must be signed in to change notification settings

reasonkit/reasonkit-mem

ReasonKit Mem

Memory & Retrieval Infrastructure for ReasonKit

CI Security Crates.io docs.rs Downloads License Rust

The Long-Term Memory Layer ("Hippocampus") for AI Reasoning

Documentation | ReasonKit Core | Website


ReasonKit Mem is the memory layer ("Hippocampus") for ReasonKit. It provides vector storage, hybrid search, RAPTOR trees, and embedding support.

Features

  • Vector Storage - Qdrant-based dense vector storage with embedded mode
  • Hybrid Search - Dense (Qdrant) + Sparse (Tantivy BM25) fusion
  • RAPTOR Trees - Hierarchical retrieval for long-form QA
  • Embeddings - Local (BGE-M3) and remote (OpenAI) embedding support
  • Reranking - Cross-encoder reranking for precision

Installation

Universal Installer (Recommended)

Installs all 4 ReasonKit projects together:

curl -fsSL https://get.reasonkit.sh | bash -s -- --with-memory

Platform & Shell Support:

  • ✅ All platforms (Linux/macOS/Windows/WSL)
  • ✅ All shells (Bash/Zsh/Fish/Nu/PowerShell/Elvish)
  • ✅ Auto-detects shell and configures PATH
  • ✅ Beautiful progress visualization

Cargo (Rust Library)

Add to your Cargo.toml:

[dependencies]
reasonkit-mem = "0.1"
tokio = { version = "1", features = ["full"] }

Usage

Basic Usage (Embedded Mode)

use reasonkit_mem::storage::Storage;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create embedded storage (automatic file storage fallback)
    let storage = Storage::new_embedded().await?;

    // Use storage...
    Ok(())
}

Storage with Custom Configuration

use reasonkit_mem::storage::{Storage, EmbeddedStorageConfig};
use std::path::PathBuf;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create storage with custom file path
    let config = EmbeddedStorageConfig::file_only(PathBuf::from("./data"));
    let storage = Storage::new_embedded_with_config(config).await?;

    // Or use Qdrant (requires running server)
    let qdrant_config = EmbeddedStorageConfig::with_qdrant(
        "http://localhost:6333",
        "my_collection",
        1536,
    );
    let qdrant_storage = Storage::new_embedded_with_config(qdrant_config).await?;

    Ok(())
}

Hybrid Search with KnowledgeBase

use reasonkit_mem::retrieval::KnowledgeBase;
use reasonkit_mem::{Document, DocumentType, Source, SourceType};
use chrono::Utc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create in-memory knowledge base
    let kb = KnowledgeBase::in_memory()?;

    // Create a document
    let source = Source {
        source_type: SourceType::Local,
        url: None,
        path: Some("notes.md".to_string()),
        arxiv_id: None,
        github_repo: None,
        retrieved_at: Utc::now(),
        version: None,
    };

    let doc = Document::new(DocumentType::Note, source)
        .with_content("Machine learning is a subset of artificial intelligence.".to_string());

    // Add document to knowledge base
    kb.add(&doc).await?;

    // Search using sparse retrieval (BM25)
    let results = kb.retriever().search_sparse("machine learning", 5).await?;

    for result in results {
        println!("Score: {:.3}, Text: {}", result.score, result.text);
    }

    Ok(())
}

Using Embeddings

use reasonkit_mem::embedding::{EmbeddingConfig, EmbeddingPipeline, OpenAIEmbedding};
use reasonkit_mem::retrieval::KnowledgeBase;
use std::sync::Arc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create OpenAI embedding provider (requires OPENAI_API_KEY env var)
    let embedding_provider = OpenAIEmbedding::openai()?;
    let pipeline = Arc::new(EmbeddingPipeline::new(Arc::new(embedding_provider)));

    // Create knowledge base with embedding support
    let kb = KnowledgeBase::in_memory()?
        .with_embedding_pipeline(pipeline);

    // Now hybrid search will use both dense (vector) and sparse (BM25)
    // let results = kb.query("semantic search query", 10).await?;

    Ok(())
}

Embedded Mode Documentation

For detailed information about embedded mode, see docs/EMBEDDED_MODE_GUIDE.md.

Architecture

ReasonKit Mem Hybrid Architecture ReasonKit Mem Hybrid Architecture Technical Diagram

The RAPTOR Algorithm (Hierarchical Indexing)

ReasonKit Mem implements RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) to answer high-level questions across large document sets.

ReasonKit Mem RAPTOR Tree Structure

ReasonKit Mem RAPTOR Tree

The Memory Dashboard

ReasonKit Mem Dashboard

Integration Ecosystem

ReasonKit Mem Ecosystem

Technology Stack

Component Technology Purpose
Qdrant qdrant-client 1.10+ Dense vector storage
Tantivy tantivy 0.22+ BM25 sparse search
RAPTOR Custom Rust Hierarchical retrieval
Embeddings BGE-M3 / OpenAI Dense representations
Reranking Cross-encoder Final precision boost

Project Structure

reasonkit-mem/
├── src/
│   ├── storage/      # Qdrant vector + file-based storage
│   ├── embedding/    # Dense vector embeddings
│   ├── retrieval/    # Hybrid search, fusion, reranking
│   ├── raptor/       # RAPTOR hierarchical tree structure
│   ├── indexing/     # BM25/Tantivy sparse indexing
│   └── rag/          # RAG pipeline orchestration
├── benches/          # Performance benchmarks
├── examples/         # Usage examples
├── docs/             # Additional documentation
└── Cargo.toml

Feature Flags

Feature Description
default Core functionality
python Python bindings via PyO3
local-embeddings Local BGE-M3 embeddings via ONNX Runtime

API Reference

Core Types (re-exported at crate root)

use reasonkit_mem::{
    // Documents
    Document, DocumentType, DocumentContent,
    // Chunks
    Chunk, EmbeddingIds,
    // Sources
    Source, SourceType,
    // Metadata
    Metadata, Author,
    // Search
    SearchResult, MatchSource, RetrievalConfig,
    // Processing
    ProcessingStatus, ProcessingState, ContentFormat,
    // Errors
    MemError, MemResult,
};

Storage Module

use reasonkit_mem::storage::{
    Storage,
    EmbeddedStorageConfig,
    StorageBackend,
    InMemoryStorage,
    FileStorage,
    QdrantStorage,
    AccessContext,
    AccessLevel,
};

Embedding Module

use reasonkit_mem::embedding::{
    EmbeddingProvider,      // Trait for embedding backends
    OpenAIEmbedding,        // OpenAI API embeddings
    EmbeddingConfig,        // Configuration
    EmbeddingPipeline,      // Batch processing pipeline
    EmbeddingResult,        // Single embedding result
    EmbeddingVector,        // Vec<f32> alias
    cosine_similarity,      // Utility function
    normalize_vector,       // Utility function
};

Retrieval Module

use reasonkit_mem::retrieval::{
    HybridRetriever,        // Main retrieval engine
    KnowledgeBase,          // High-level API
    HybridResult,           // Search result
    RetrievalStats,         // Statistics
    // Fusion
    FusionEngine,
    FusionStrategy,
    // Reranking
    Reranker,
    RerankerConfig,
};

Version & Maturity

Component Status Notes
Vector Storage ✅ Stable Qdrant integration production-ready
Hybrid Search ✅ Stable Dense + Sparse fusion working
RAPTOR Trees ✅ Stable Hierarchical retrieval implemented
Embeddings ✅ Stable OpenAI API fully supported
Local Embeddings 🔶 Beta BGE-M3 ONNX (enable with local-embeddings feature)
Python Bindings 🔶 Beta Build from source with --features python

Current Version: v0.1.2 | CHANGELOG | Releases

Verify Installation

use reasonkit_mem::storage::Storage;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Quick verification - creates in-memory storage
    let storage = Storage::new_embedded().await?;
    println!("ReasonKit Mem initialized successfully!");
    Ok(())
}

License

Apache License 2.0 - see LICENSE


ReasonKit Ecosystem Connection

Part of the ReasonKit Ecosystem

ReasonKit Core | ReasonKit Web | Website

"See How Your AI Thinks"