Skip to content

fix: add deduplication for episodic/event_log write and foresight expiry cleanup#102

Open
r266-tech wants to merge 1 commit intoEverMind-AI:mainfrom
r266-tech:fix/write-pipeline-dedup
Open

fix: add deduplication for episodic/event_log write and foresight expiry cleanup#102
r266-tech wants to merge 1 commit intoEverMind-AI:mainfrom
r266-tech:fix/write-pipeline-dedup

Conversation

@r266-tech
Copy link

Summary

Fixes #95 — Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight.

Problem

  1. Duplicate records: When the same MemCell is processed more than once, both episodic_memory and event_log collections accumulate duplicate entries with the same parent_id. This degrades retrieval ranking quality.

  2. Stale foresight: ForesightRecord has a validity window (start_time / end_time), but expired records are never deleted — they just accumulate dead data across MongoDB, Elasticsearch, and Milvus.

Changes

1. Delete-before-insert dedup in save_memory_docs()

In src/biz_layer/mem_memorize.py, before inserting new docs:

  • Episodic memory: delete existing records with matching parent_id from MongoDB, ES, and Milvus
  • Event log: same delete-before-insert by parent_id across all three stores
  • Dedup is best-effort: failures are logged as warnings but do not block the insert

2. Foresight expiry cleanup

New cleanup_expired_foresights() function that:

  • Queries MongoDB for ForesightRecords where end_time < today
  • Deletes those records from MongoDB, Elasticsearch, and Milvus
  • Returns the count of deleted records
  • Can be invoked periodically (e.g., via cron/scheduler)

3. New delete_by_parent_id on EpisodicMemoryRawRepository

Added the missing method (EventLogRecordRawRepository already had this).

4. Tests

tests/test_write_pipeline_dedup.py covers:

  • Episodic dedup deletes old records before insert
  • Dedup failure does not block insert (best-effort)
  • Event log dedup deletes old records before insert
  • Foresight cleanup removes expired records from all stores
  • Cleanup returns 0 when no expired records exist

…iry cleanup

Closes EverMind-AI#95

## Changes

### 1. Delete-before-insert dedup in save_memory_docs()
- For episodic_memory: before inserting, delete existing records with
  the same parent_id from MongoDB, Elasticsearch, and Milvus
- For event_log: same delete-before-insert by parent_id across all stores
- Dedup is best-effort: failures are logged as warnings but do not block insert

### 2. Foresight expiry cleanup
- New cleanup_expired_foresights() function that removes ForesightRecords
  where end_time < today from all three stores (MongoDB, ES, Milvus)
- Can be called periodically (e.g., via cron/scheduler) to keep storage lean

### 3. New delete_by_parent_id on EpisodicMemoryRawRepository
- Added missing method to delete episodic memories by parent_id
  (EventLogRecordRawRepository already had this method)

### 4. Tests
- tests/test_write_pipeline_dedup.py covers dedup and cleanup with mocked repos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight

1 participant