Status: Work In Progress
High-performance MemGPT-style hierarchical memory system for local LLM agents.
A local-only, in-RAM implementation designed for maximum performance via Unix socket IPC. Inspired by MemGPT and Letta.
MemGPT demonstrated that LLMs can manage their own memory hierarchies, enabling unbounded context through intelligent memory paging. Letta (formerly MemGPT) provides a cloud-hosted implementation.
Olympus Memory Engine takes a different approach:
- Local-only: Your data never leaves your machine
- In-RAM + PostgreSQL: Hot path in memory, cold storage in pgvector
- Unix socket IPC: Microsecond latency for agent-to-engine communication
- Multi-agent native: Isolated memory spaces with @mention routing
Built for researchers and developers who need MemGPT-style capabilities without cloud dependencies.
┌─────────────────────────────────────────────────────────────┐
│ Agent Layer │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Alice │ │ Bob │ │ Charlie │ ... N agents │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ @mention routing │
├────────────────────┼────────────────────────────────────────┤
│ ▼ │
│ ┌──────────────┐ │
│ │ Agent Manager │ Unix Socket / IPC │
│ └──────┬───────┘ │
│ │ │
├───────────────────┼─────────────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Hierarchical Memory System │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │
│ │ │ System │ │ Working │ │ FIFO │ │Archival│ │ │
│ │ │ Memory │ │ Memory │ │ Queue │ │Storage │ │ │
│ │ │ (static) │ │(editable)│ │ (recent) │ │(vector)│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ PostgreSQL + pgvector │ │
│ │ 768-dim embeddings, HNSW index │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
| Tier | Purpose | Persistence | Access Pattern |
|---|---|---|---|
| System | Agent identity, instructions, tool schemas | Static | Read-only |
| Working | Editable facts, conversation state | Per-session | Read/Write |
| FIFO | Recent message history | Sliding window | Auto-overflow |
| Archival | Long-term semantic storage | Permanent | Vector search |
- MemGPT-style memory management: Agents control their own memory operations
- Semantic search: pgvector with HNSW indexing for fast similarity queries
- 768-dim embeddings: Via Ollama's nomic-embed-text (local, no API calls)
- Memory overflow: FIFO automatically flushes to archival when full
- Isolated memory spaces: Each agent has separate memory partitions
- @mention routing:
@alice helloroutes to specific agent - Agent-to-agent messaging: Built-in
message_agent()tool - Concurrent operation: Multiple agents can run simultaneously
- File operations: read, write, edit, delete (workspace-isolated)
- Command execution: Whitelisted commands only (ls, cat, grep, python3, etc.)
- Python REPL: Execute Python with timeout enforcement
- Web access: URL fetching (HTTP/HTTPS, size-limited)
- Search: Glob patterns and regex content search
- Workspace isolation: All file ops sandboxed to agent workspace
- Path traversal prevention: No
../escapes allowed - Command injection blocking: Shell operators detected and blocked
- Timeout enforcement: All operations have configurable limits
# PostgreSQL 14+ with pgvector
sudo apt install postgresql postgresql-contrib
psql -c "CREATE EXTENSION vector;"
# Ollama for local LLM + embeddings
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull gpt-oss:20b
ollama pull nomic-embed-textcd olympus-memory-engine
poetry installpoetry run python scripts/init_database.pypoetry run omeOlympus Memory Engine v0.1.0
Agents: alice, bob
Type @agent message or /help
> @alice create a file called notes.txt with "Meeting at 3pm"
Alice: I've created notes.txt with your meeting reminder.
> @bob read the notes.txt file
Bob: The file contains: "Meeting at 3pm"
> @alice remember that bob knows about the meeting
Alice: I've stored that in my working memory.
> @alice what do you remember about bob?
Alice: Bob knows about the meeting scheduled for 3pm.
Edit config.yaml:
database:
host: localhost
port: 5432
name: olympus_memory
user: postgres
embedding:
model: nomic-embed-text
dimensions: 768
agents:
alice:
model: gpt-oss:20b
system_prompt: "You are Alice, a helpful research assistant."
tools_enabled: true
bob:
model: gpt-oss:20b
system_prompt: "You are Bob, a software engineer."
tools_enabled: true# All tests (145 total)
poetry run pytest
# By category
poetry run pytest -m unit # Fast unit tests
poetry run pytest -m integration # Requires PostgreSQL + Ollama
# Specific modules
poetry run pytest tests/test_tools_security.py -v # 29 security tests
poetry run pytest tests/test_memory_storage.py -v # PostgreSQL testspoetry run mypy src --pretty # Type checking (0 errors)
poetry run ruff check src tests # Linting
poetry run ruff format src tests # Formattingolympus-memory-engine/
├── src/
│ ├── agents/ # Agent manager, MemGPT agent implementation
│ ├── memory/ # 4-tier memory hierarchy, PostgreSQL backend
│ ├── tools/ # Sandboxed tool system
│ ├── llm/ # Ollama client, embedding generation
│ ├── ui/ # Interactive shell with @mention parsing
│ └── infrastructure/ # Logging, metrics, configuration
├── tests/ # 145 tests (unit + integration + security)
├── scripts/ # Database initialization, utilities
├── docs/ # Architecture documentation
└── config.yaml # Agent and database configuration
This project draws inspiration from:
- MemGPT (Packer et al., 2023): "MemGPT: Towards LLMs as Operating Systems" - The foundational paper on LLM-managed hierarchical memory
- Letta: The official MemGPT implementation (cloud-hosted)
- pgvector: PostgreSQL extension for vector similarity search
Key differences from Letta:
- Local-only (no cloud, no API keys required for core functionality)
- Unix socket IPC for minimal latency
- PostgreSQL-native (vs. external vector DBs)
- Multi-agent first design
| Metric | Target | Current |
|---|---|---|
| Memory query latency | <1ms | ~0.15ms (pgvector HNSW) |
| Embedding generation | <50ms | ~30ms (nomic-embed-text) |
| Agent response time | <2s | Model-dependent |
| Concurrent agents | 10+ | Tested with 5 |
- Unix domain socket server mode (for IPC from other processes)
- C++ CUDA acceleration for vector operations
- Memory compaction and summarization
- Cross-agent memory sharing (with permissions)
- Prometheus metrics endpoint
MIT License - See LICENSE for details.
Built on the shoulders of giants:
- The MemGPT team for the hierarchical memory paradigm
- Letta for proving the concept at scale
- The Ollama team for making local LLMs accessible
- pgvector maintainers for excellent PostgreSQL vector support