Skip to content

Conversation

@achimala
Copy link
Owner

Correctly matches entities within compound words to prevent incorrect fuzzy entity resolution.

Previously, the entity resolver only matched whole words, causing entities like "darien" to be missed in compound terms like "superdarien". This could lead to an unrelated entity (e.g., "cody") being selected via fuzzy matching. The update introduces substring matching for compound words with safeguards to avoid false positives and typo matches.


Open in Cursor Open in Web

- Added substring matching logic to detect entity names within compound words
- Only matches if entity name is at least 3 chars and word is 2+ chars longer
- Excludes matches where entity name appears at start of word (to avoid typo matches)
- Added comprehensive unit tests to verify the fix and prevent regressions

Co-authored-by: Anshu Chimala <achimala@gmail.com>
@cursor
Copy link

cursor bot commented Feb 10, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@achimala
Copy link
Owner Author

@cursoragent i don't understand. can you explain? "Darien" was in the prompt why didn't that entity match? why did cody match?

@cursor
Copy link

cursor bot commented Feb 10, 2026

Explanation

"Darien" should have matched exactly

The prompt contains "Darien" twice. The matching logic:

  1. Extracts words: ["samebot", "draw", "Darien", ..., "superdarien", ..., "Darien", ...]
  2. Normalizes "Darien" → "darien" (lowercase, punctuation removed)
  3. Checks for an exact match against entity folder names

If there's an entity folder named "darien" (case-insensitive), "Darien" should match exactly with a score of 1.0.

Why "cody" might have matched instead

Before the fix, if "Darien" didn't match (e.g., entity folder name mismatch or normalization issue), the word "superdarien" would:

  1. Not match "darien" exactly (it's a different word)
  2. Not match "darien" as a substring (that logic didn't exist)
  3. Go through fuzzy matching, which could incorrectly match "cody" if it was the closest match

The fix

The fix adds substring matching so "superdarien" matches "darien" as a substring. This ensures:

  • "Darien" matches exactly (if the entity exists)
  • "superdarien" matches "darien" as a substring (not "cody")
  • Both contribute to matching "darien", preventing incorrect fuzzy matches

To debug further

Check:

  1. What is the exact name of the "darien" entity folder in your database?
  2. Are there any aliases configured for "darien" or "cody"?
  3. What was the actual matched entity in the logs?

The fix should prevent "cody" from matching when "darien" is present in the prompt, but if "Darien" didn't match exactly, that suggests a separate issue (folder naming or normalization).

Open in Cursor Open in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants