-
Notifications
You must be signed in to change notification settings - Fork 129
Description
I could see different results are returned when using methods extractOne and extractTop on the same query string and collections.
I have a pretty long list of collection (15k Strings) to search for each query.
For Instance, let's say I have the following scenario
Query - ABC 1721
The collection has following strings in it
ABC1721
ABC1721-FGH/L9
ABC MERAKI Z1
EFGD3111/Z1-ABC
and many more
extractOne("ABC 1721", collection)
gives - ABC1721, Ratio - 95
extractTop("ABC 1721", collection,1)
gives - ABC1721, Ratio - 95
but the problem arose when I want the top 5 results
extractTop("ABC 1721", collection,5)
Match 1 - ABC1721-FGH/L9, Ratio - 86
Match 2 - ABC MERAKI Z1, Ratio - 86
Match 3 - EFGD3111/Z1-ABC, Ratio - 86
and so on
I tried using 'extractSorted' as well, it doesn't give consistent results as extractOne.
I used extractTop (for top 5) and extractOne for 1000+ queries. Around 70% of the 1st Match from extractTop doesn't match with the result of extractOne
BTW, I would like to appreciate your efforts on porting the python logic to Java without any performance lag