Skip to content

Conversation

@julochrobak
Copy link
Contributor

This improves the performance of the fuzzy function by avoiding allocation of the matrix on each call. In other words, there is a matrix allocated on the first call and reused on the following calls. If bigger matrix is required it is recreated.

In order to allow parallel execution we cannot use a single global variable to store the matrix. I've introduced a State property to the Func structure which allows every function to have a state and reuse it among the calls.

Another small performance improvement is to use own min/max function and avoid the built-in Min/Max on floats.

@ostap
Copy link
Owner

ostap commented Dec 10, 2015

As tempting as it looks, I would rather not introduce the State property for the sake of performance. It has been a while, but if you are still interested in optimizing the approximate string matching, there is a different way to approach it. Instead of using len(s)*len(t) space to store the matrix D, you can do away with just 2 * min(len(s), len(t)) by keeping only two rows or columns (whichever is shorter). Here is the main loop over the matrix:

for j := 1; j < n; j++ {
        for i := 1; i < m; i++ {
                if s[i-1] == t[j-1] {
                        d[i][j] = d[i-1][j-1] // no operation required
                } else {
                        d[i][j] = min(
                                d[i-1][j]+1,   // a deletion
                                d[i][j-1]+1,   // an insertion
                                d[i-1][j-1]+1) // a substitution
                }
        }
}

Note that the indexes only go back as far as i-1 or j-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants