Skip to content

rutgerswiselab/MemRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MemRec: Collaborative Memory-Augmented Agentic Recommender System

MemRec is a memory-augmented intelligent recommender system that achieves efficient personalized recommendations through collaborative memory mechanisms and large language models.

If you find our work helpful, please consider citing our paper:

@article{chen2026memrec,
  title   = {MemRec: Collaborative Memory-Augmented Agentic Recommender System},
  author  = {Chen, Weixin and Zhao, Yuhan and Huang, Jingyuan and Ye, Zihe and Ju, Clark Mingxuan and Zhao, Tong and Shah, Neil and Chen, Li and Zhang, Yongfeng},
  year    = 2026,
  journal = {arXiv preprint arXiv:2601.08816},
  url     = {https://arxiv.org/abs/2601.08816}
}

πŸ“ Project Structure

memrec/
β”œβ”€β”€ configs/             # Experiment configurations
β”œβ”€β”€ scripts/             # Run scripts (train, eval, data processing)
└── src/
    β”œβ”€β”€ memory/          # Memory mechanisms (Storage, Pruner, Graph)
    β”œβ”€β”€ models/          # MemRec Agent & LLM Clients
    β”œβ”€β”€ train/           # Trainer & Metrics
    └── data/            # Dataset loaders & Samplers

πŸš€ Quick Start

1. Environment Setup

# Create conda environment
conda create -n memrec python=3.10
conda activate memrec

# Install dependencies
pip install -r requirements.txt

Requirements:

  • Python 3.10+
  • PyTorch 2.9.0+
  • CUDA 12.1+ (recommended for accelerating candidate retrieval models)
  • LLM API support (Azure OpenAI, local vLLM, etc.)

2. Configure API Keys

MemRec requires LLM API access. Set environment variables:

# Azure OpenAI (recommended)
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"

If your LLM service is API only, such as ChatGPT or Gemini, simply remove the ENDPOINT and directly use your API key, and modify the LLM calling interface function to be consistent with your LLM service.

3. Download Datasets

Download the InstructRec datasets published by iAgent:

πŸ“¦ Google Drive Link: InstructRec Datasets

After downloading, extract the datasets to the data/iagent/ directory:

# Create iagent directory if it doesn't exist
mkdir -p data/iagent

# Extract datasets to data/iagent/ directory
# Place all downloaded files (*.pkl and *.csv) into data/iagent/

# Convert all InstructRec datasets from iAgent format to MemRec format
bash scripts/convert_all_instructrec.sh

# Verify processed datasets
ls data/processed/
# Should see: instructrec-books, instructrec-goodreads, instructrec-movietv, instructrec-yelp

Supported Datasets:

  • instructrec-books: Book recommendations
  • instructrec-goodreads: Goodreads books
  • instructrec-movietv: Movie and TV recommendations
  • instructrec-yelp: Yelp business recommendations

4. Run MemRec

Basic Usage

python scripts/run_train.py \
  --model memrec_agent \
  --dataset instructrec-books \
  --config configs/memrec_instructrec-books.yaml \
  --device cuda:0

Custom Configuration

python scripts/run_train.py \
  --model memrec_agent \
  --dataset instructrec-books \
  --config configs/memrec_instructrec-books.yaml \
  --device cuda:0 \
  --n_eval_users 100 \        # Number of evaluation users
  --n_eval_candidates 10 \    # Number of candidate items
  --parallel \                # Enable parallel evaluation
  --parallel_workers 8        # Number of parallel workers

5. View Results

# View detailed results
cat results/runs/instructrec-books_memrec_agent_seed42_*.json | python -m json.tool

# View LLM conversation logs (if --save_llm_conversations was enabled)
ls results/runs/*/llm_conversations/

LLM Configuration

provider:
  name: azure_openai         # azure_openai, qwen, llama, etc.
  model: gpt-4o-mini
  endpoint: ${ENV:AZURE_OPENAI_ENDPOINT}
  api_key: ${ENV:AZURE_OPENAI_API_KEY}

πŸ“Š Evaluation Metrics

MemRec uses the following metrics for evaluation (default K ∈ {1, 3, 5, 10}):

  • Hit@K: Whether the target item is in the Top-K
  • NDCG@K: Normalized Discounted Cumulative Gain, considering ranking positions

🧠 Core Modules

1. Memory Manager

Manages user and item memories:

  • Dynamic memory content updates
  • Cross-user knowledge sharing support
  • Automatic pruning of expired or low-quality memories
from src.memory.manager import MemoryManager

memory_manager = MemoryManager(config)
memory_manager.warmup(train_data)  # Warm-up phase
recommendations = memory_manager.recommend(user_id, candidates)

2. Memory Pruner

Selects the most relevant memories for context construction:

  • llm_rules: Uses LLM-generated domain rules
  • hybrid_rule: Feature-weighted hybrid rules
from src.memory.pruner import MemoryPruner

pruner = MemoryPruner(mode='llm_rules')
selected_memories = pruner.prune(candidate_memories, target_user, budget)

3. LLM Client

Unified LLM interface supporting multiple providers:

from src.models.llm_client import LLMClient

llm_client = LLMClient(provider='azure_openai', model='gpt-4o-mini')
response = llm_client.generate(prompt, max_tokens=4000)

4. Reranker Module

Performs precise ranking of candidate items:

  • LLM Reranker: Uses LLM to understand reasons and rank
  • Vector Reranker: Fast ranking based on vector similarity
from src.models.reranker_llm import LLMReranker

reranker = LLMReranker(llm_client)
ranked_items = reranker.rerank(user_profile, candidates, reasons)

πŸ”¬ Advanced Usage

Custom Domain Rules

Add new domain rule files in src/memory/domain_rules/:

# src/memory/domain_rules/custom_rules.py
def get_custom_rules():
    return {
        'user_preference': 'weight=0.8',
        'item_quality': 'weight=0.7',
        'recency': 'weight=0.6',
        # Add more rules...
    }

Parallel Evaluation Optimization

Increase parallel workers to accelerate evaluation:

python scripts/run_train.py \
  --model memrec_agent \
  --dataset instructrec-books \
  --config configs/memrec_instructrec-books.yaml \
  --parallel \
  --parallel_workers 32  # Adjust based on CPU cores

❓ FAQ

Q1: LLM API Call Failure

Solution:

  • Check if environment variables are correctly set
  • Verify API key validity
  • Check network connection and API quota
# Verify environment variables
echo $AZURE_OPENAI_ENDPOINT
echo $AZURE_OPENAI_API_KEY

Q2: Slow Evaluation Speed

Solution:

  • Use vector reranker: set reranker_mode: vector in config
  • Reduce evaluation users: --n_eval_users 100
  • Increase parallel threads: --parallel --parallel_workers 16

Q3: How to Reproduce Paper Results

Ensure using the same configuration:

# For full evaluation (all test users)
python scripts/run_train.py \
  --model memrec_agent \
  --dataset instructrec-books \
  --config configs/memrec_instructrec-books.yaml \
  --seed 42

# For 1k sampled users evaluation (for reproducibility)
# Note: The config file already specifies eval_user_list: eval_user_sample_1k_instructrec-books.json
# Make sure the JSON file is in the project root directory
python scripts/run_train.py \
  --model memrec_agent \
  --dataset instructrec-books \
  --config configs/memrec_instructrec-books_1k.yaml \
  --seed 42

Note for 1k evaluation: The memrec_instructrec-books_1k.yaml config file already includes eval_user_list: eval_user_sample_1k_instructrec-books.json. Ensure this JSON file is placed in the project root directory (same level as scripts/ and configs/).

Q4: Custom Dataset

  1. Prepare data in unified format (user_id, item_id, rating, timestamp)
  2. Map IDs to 0-based integers
  3. Save as .inter file to data/processed/your-dataset/
  4. Copy and modify configuration file
  5. Run training script

About

MemRec

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •