MemRec is a memory-augmented intelligent recommender system that achieves efficient personalized recommendations through collaborative memory mechanisms and large language models.
If you find our work helpful, please consider citing our paper:
@article{chen2026memrec,
title = {MemRec: Collaborative Memory-Augmented Agentic Recommender System},
author = {Chen, Weixin and Zhao, Yuhan and Huang, Jingyuan and Ye, Zihe and Ju, Clark Mingxuan and Zhao, Tong and Shah, Neil and Chen, Li and Zhang, Yongfeng},
year = 2026,
journal = {arXiv preprint arXiv:2601.08816},
url = {https://arxiv.org/abs/2601.08816}
}memrec/
βββ configs/ # Experiment configurations
βββ scripts/ # Run scripts (train, eval, data processing)
βββ src/
βββ memory/ # Memory mechanisms (Storage, Pruner, Graph)
βββ models/ # MemRec Agent & LLM Clients
βββ train/ # Trainer & Metrics
βββ data/ # Dataset loaders & Samplers
# Create conda environment
conda create -n memrec python=3.10
conda activate memrec
# Install dependencies
pip install -r requirements.txtRequirements:
- Python 3.10+
- PyTorch 2.9.0+
- CUDA 12.1+ (recommended for accelerating candidate retrieval models)
- LLM API support (Azure OpenAI, local vLLM, etc.)
MemRec requires LLM API access. Set environment variables:
# Azure OpenAI (recommended)
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"If your LLM service is API only, such as ChatGPT or Gemini, simply remove the ENDPOINT and directly use your API key, and modify the LLM calling interface function to be consistent with your LLM service.
Download the InstructRec datasets published by iAgent:
π¦ Google Drive Link: InstructRec Datasets
After downloading, extract the datasets to the data/iagent/ directory:
# Create iagent directory if it doesn't exist
mkdir -p data/iagent
# Extract datasets to data/iagent/ directory
# Place all downloaded files (*.pkl and *.csv) into data/iagent/
# Convert all InstructRec datasets from iAgent format to MemRec format
bash scripts/convert_all_instructrec.sh
# Verify processed datasets
ls data/processed/
# Should see: instructrec-books, instructrec-goodreads, instructrec-movietv, instructrec-yelpSupported Datasets:
- instructrec-books: Book recommendations
- instructrec-goodreads: Goodreads books
- instructrec-movietv: Movie and TV recommendations
- instructrec-yelp: Yelp business recommendations
python scripts/run_train.py \
--model memrec_agent \
--dataset instructrec-books \
--config configs/memrec_instructrec-books.yaml \
--device cuda:0python scripts/run_train.py \
--model memrec_agent \
--dataset instructrec-books \
--config configs/memrec_instructrec-books.yaml \
--device cuda:0 \
--n_eval_users 100 \ # Number of evaluation users
--n_eval_candidates 10 \ # Number of candidate items
--parallel \ # Enable parallel evaluation
--parallel_workers 8 # Number of parallel workers# View detailed results
cat results/runs/instructrec-books_memrec_agent_seed42_*.json | python -m json.tool
# View LLM conversation logs (if --save_llm_conversations was enabled)
ls results/runs/*/llm_conversations/provider:
name: azure_openai # azure_openai, qwen, llama, etc.
model: gpt-4o-mini
endpoint: ${ENV:AZURE_OPENAI_ENDPOINT}
api_key: ${ENV:AZURE_OPENAI_API_KEY}MemRec uses the following metrics for evaluation (default K β {1, 3, 5, 10}):
- Hit@K: Whether the target item is in the Top-K
- NDCG@K: Normalized Discounted Cumulative Gain, considering ranking positions
Manages user and item memories:
- Dynamic memory content updates
- Cross-user knowledge sharing support
- Automatic pruning of expired or low-quality memories
from src.memory.manager import MemoryManager
memory_manager = MemoryManager(config)
memory_manager.warmup(train_data) # Warm-up phase
recommendations = memory_manager.recommend(user_id, candidates)Selects the most relevant memories for context construction:
- llm_rules: Uses LLM-generated domain rules
- hybrid_rule: Feature-weighted hybrid rules
from src.memory.pruner import MemoryPruner
pruner = MemoryPruner(mode='llm_rules')
selected_memories = pruner.prune(candidate_memories, target_user, budget)Unified LLM interface supporting multiple providers:
from src.models.llm_client import LLMClient
llm_client = LLMClient(provider='azure_openai', model='gpt-4o-mini')
response = llm_client.generate(prompt, max_tokens=4000)Performs precise ranking of candidate items:
- LLM Reranker: Uses LLM to understand reasons and rank
- Vector Reranker: Fast ranking based on vector similarity
from src.models.reranker_llm import LLMReranker
reranker = LLMReranker(llm_client)
ranked_items = reranker.rerank(user_profile, candidates, reasons)Add new domain rule files in src/memory/domain_rules/:
# src/memory/domain_rules/custom_rules.py
def get_custom_rules():
return {
'user_preference': 'weight=0.8',
'item_quality': 'weight=0.7',
'recency': 'weight=0.6',
# Add more rules...
}Increase parallel workers to accelerate evaluation:
python scripts/run_train.py \
--model memrec_agent \
--dataset instructrec-books \
--config configs/memrec_instructrec-books.yaml \
--parallel \
--parallel_workers 32 # Adjust based on CPU coresSolution:
- Check if environment variables are correctly set
- Verify API key validity
- Check network connection and API quota
# Verify environment variables
echo $AZURE_OPENAI_ENDPOINT
echo $AZURE_OPENAI_API_KEYSolution:
- Use vector reranker: set
reranker_mode: vectorin config - Reduce evaluation users:
--n_eval_users 100 - Increase parallel threads:
--parallel --parallel_workers 16
Ensure using the same configuration:
# For full evaluation (all test users)
python scripts/run_train.py \
--model memrec_agent \
--dataset instructrec-books \
--config configs/memrec_instructrec-books.yaml \
--seed 42
# For 1k sampled users evaluation (for reproducibility)
# Note: The config file already specifies eval_user_list: eval_user_sample_1k_instructrec-books.json
# Make sure the JSON file is in the project root directory
python scripts/run_train.py \
--model memrec_agent \
--dataset instructrec-books \
--config configs/memrec_instructrec-books_1k.yaml \
--seed 42Note for 1k evaluation: The memrec_instructrec-books_1k.yaml config file already includes eval_user_list: eval_user_sample_1k_instructrec-books.json. Ensure this JSON file is placed in the project root directory (same level as scripts/ and configs/).
- Prepare data in unified format (user_id, item_id, rating, timestamp)
- Map IDs to 0-based integers
- Save as
.interfile todata/processed/your-dataset/ - Copy and modify configuration file
- Run training script