feat: semantic deduplication

Embedding-based near-duplicate detection using FAISS or Annoy, is important.

Implementing a configurable threshold for “similarity score” will help remove redundant rows.