Skip to content

Inconsistent results from extractOne and extractTop #83

@eswarn24

Description

@eswarn24

I could see different results are returned when using methods extractOne and extractTop on the same query string and collections.

I have a pretty long list of collection (15k Strings) to search for each query.

For Instance, let's say I have the following scenario
Query - ABC 1721
The collection has following strings in it
ABC1721
ABC1721-FGH/L9
ABC MERAKI Z1
EFGD3111/Z1-ABC
and many more

extractOne("ABC 1721", collection)
gives - ABC1721, Ratio - 95

extractTop("ABC 1721", collection,1)
gives - ABC1721, Ratio - 95

but the problem arose when I want the top 5 results
extractTop("ABC 1721", collection,5)
Match 1 - ABC1721-FGH/L9, Ratio - 86
Match 2 - ABC MERAKI Z1, Ratio - 86
Match 3 - EFGD3111/Z1-ABC, Ratio - 86
and so on

I tried using 'extractSorted' as well, it doesn't give consistent results as extractOne.

I used extractTop (for top 5) and extractOne for 1000+ queries. Around 70% of the 1st Match from extractTop doesn't match with the result of extractOne

BTW, I would like to appreciate your efforts on porting the python logic to Java without any performance lag

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions