Skip to content
/ memex Public

A personal LLM leveraging my corpus as its knowledge base

Notifications You must be signed in to change notification settings

Winggo/memex

Repository files navigation

MemEx (Memory Extension)

A personal LLM leveraging my corpus as its knowledge base.

Data sources:

  • Google
    • Emails (mbox)
    • Calendar (ics)
    • Google Search Histroy (html, json)
    • Youtube Search History (html)
    • Contacts (vcf)
    • Drive (docx)
    • Maps (csv)
  • Apple
    • Notes (txt)
    • Contacts (vcf)
    • Messages (csv)
    • Calendar (ics)

Currently included data:

  • Apple notes
  • Apple contacts
  • Apple messages
  • Google calendar
  • Google maps

Implementation Steps

  1. Download data from sources Google Takeout & iCloud Exports
  2. Data processing: clean, chunk, and generating embeddings
  3. Store embeddings into a vector DB
  4. Leverage an open-source LLM for inference and to perform RAG
  5. Use Discord or another app to serve as MemEx's UI

Architecture

  1. Parsing, chunking, preprocessing
    • Langchain + Unstructured.io
  2. Embedding
    • BAAI/bge-large-en-v1.5 + Together.ai
  3. Vector DB
    • Chroma
  4. Retriever & RAG Model
    • meta-llama/Llama-3.3-70B-Instruct-Turbo
    • Qwen/Qwen2.5-7B-Instruct-Turbo
  5. Optional UI
    • Discord / iMessage (using BlueBubbles) / Gradio
  6. Backend hosting
    • Fly.io

Usage

  1. Create .env with required env variables (reference .env.example)
  2. Generate embeddings from data directory, and store them in vector DB Chroma
    • python3 src/scripts/generate_embeddings.py --folder_path ./data --chunk_max_characters 1500
  3. Launch FastAPI server to handle requests between LLM and messaging services
    • python3 -m src.app

Deployment

To deploy onto fly.io:

  1. fly auth login
  2. fly deploy (fly launch if first time)
    • volume should be provisioned alongside
  3. Transfer local chroma_db data into volume
    1. Copy sqlite3 file
      1. fly ssh sftp shell
      2. put ./chroma_db/chroma.sqlite3 /chroma_db/chroma.sqlite3
      3. exit
    2. Create 36e27b04-a9... directory
      1. fly ssh console
      2. mkdir -p /chroma_db/36e27b04-a9...
      3. exit
    3. Copy remaining files
      1. fly ssh sftp shell
      2. put ./chroma_db/…/data_level0.bin
      3. repeat

If connection lost:

  1. restart machine on fly.io dashboard OR
  2. exit and then fly ssh sftp shell

Setup Issues

Upon executing python3 src/generate_embeddings.py, if you encounter the following error:
[nltk_data] Error loading punkt_tab: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1000)>

Fix: sudo /Applications/Python\ 3.12/Install\ Certificates.command

pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

Fix: brew install poppler

Docs

Attribution

This project idea was inspired by this post from Linus: https://x.com/thesephist/status/1629272600156176386

About

A personal LLM leveraging my corpus as its knowledge base

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •