Skip to content

Why is it necessary to combine candidate pools obtained with filter and unfiltered criteria in FilteredVamana indexing? #682

@SplashCloud

Description

@SplashCloud

The code is in the function search_for_point_and_prune in index.cpp file:

        std::shared_lock<std::shared_timed_mutex> tl(_tag_lock, std::defer_lock);
        if (_dynamic_index)
            tl.lock();
        std::vector<uint32_t> filter_specific_start_nodes;
        for (auto &x : _location_to_labels[location])
            filter_specific_start_nodes.emplace_back(_label_to_start_id[x]);

        if (_dynamic_index)
            tl.unlock();

        _data_store->get_vector(location, scratch->aligned_query());
        iterate_to_fixed_point(scratch, filteredLindex, filter_specific_start_nodes, true,
                               _location_to_labels[location], false);
        
        // Here
        // combine candidate pools obtained with filter and unfiltered criteria.
        std::set<Neighbor> best_candidate_pool;
        for (auto filtered_neighbor : scratch->pool())
        {
            best_candidate_pool.insert(filtered_neighbor);
        }

        // clear scratch for finding unfiltered candidates
        scratch->clear();

        _data_store->get_vector(location, scratch->aligned_query());
        iterate_to_fixed_point(scratch, Lindex, init_ids, false, unused_filter_label, false);

        for (auto unfiltered_neighbour : scratch->pool())
        {
            // insert if this neighbour is not already in best_candidate_pool
            if (best_candidate_pool.find(unfiltered_neighbour) == best_candidate_pool.end())
            {
                best_candidate_pool.insert(unfiltered_neighbour);
            }
        }

        scratch->pool().clear();
        std::copy(best_candidate_pool.begin(), best_candidate_pool.end(), std::back_inserter(scratch->pool()));

The algorithm in the paper does not mention using the results of unfiltered search as part of the neighbor candidate set.
Then, I commented out the code for the unfiltered search and only used the results of the filtered search as neighbor candidates for pruning, and the constructed index performed better:

# combine filtered search and unfiltered search
Using 10 threads to search
  Ls         QPS     Avg dist cmps  Mean Latency (mus)   99.9 Latency   Recall@10
=================================================================================
  10    12354.06             74.61              807.78        1561.57       16.92
  50     4896.18            230.08             2040.62        3916.28       52.96
 100     3160.04            361.73             3162.08        5720.58       67.57
 200     1895.10            597.24             5273.57        8610.87       79.11
 300     1348.74            820.74             7409.41       10601.51       84.14
 400     1045.22           1037.71             9560.64       12604.73       86.97
 500      850.17           1249.79            11755.75       15468.46       88.82
 600      713.07           1457.35            14015.92       17650.42       90.14
 650      648.90           1559.70            15401.10       18695.55       90.68

# only use filtered search
Using 10 threads to search
  Ls         QPS     Avg dist cmps  Mean Latency (mus)   99.9 Latency   Recall@10
=================================================================================
  10    11770.21            125.70              847.92        1624.26       31.96
  50     5101.96            336.21             1957.79        3600.15       70.68
 100     3246.70            526.91             3077.13        4975.10       82.16
 200     1896.82            872.66             5268.55        7761.59       89.84
 300     1331.29           1195.33             7506.48       10179.60       92.85
 400     1025.38           1501.36             9746.72       11969.66       94.38
 500      830.82           1793.70            12028.78       14463.68       95.29
 600      699.44           2076.82            14289.13       16531.08       95.94
 650      647.19           2214.84            15442.54       17868.96       96.20

I want to know why the results of unfiltered search are considered here. Is it to be compatible with query searches without filters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions