Skip to content

Speed-up Indexing with custom metrics #245

@akurniawan

Description

@akurniawan

Hi, thanks for the great package! Currently I'm trying to build a kNN with custom metrics for DTW, but the index building for just 90k of data would take around 5 hours. Do you have any suggestions to improve this?

Below is the code that I used to build the index and perform a query

@jit(nopython=True, fastmath=True)
def dtw_numba(x, y):
    """
    Compute the Dynamic Time Warping (DTW) distance between two sequences.
    
    Parameters:
    x : array-like
        First sequence.
    y : array-like
        Second sequence.
        
    Returns:
    float
        The DTW distance between sequences x and y.
    """
    n, m = len(x), len(y)
    dtw_matrix = np.full((n + 1, m + 1), np.inf)
    dtw_matrix[0, 0] = 0

    for i in range(1, n + 1):
        for j in range(1, m + 1):
            cost = (x[i - 1] - y[j - 1]) ** 2
            dtw_matrix[i, j] = cost + min(dtw_matrix[i - 1, j],    # insertion
                                          dtw_matrix[i, j - 1],    # deletion
                                          dtw_matrix[i - 1, j - 1]) # match

    return np.sqrt(dtw_matrix[n, m])

import pynndescent

index = pynndescent.NNDescent(flat_inj_vecs, metric=dtw_numba)

index.query([flat_vecs[10]], k=100)

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions