[FEA] Hash map design that supports reduce by key

**Is your feature request related to a problem? Please describe.**

I would like to be able to implement a hash-based reduce-by-key operation. A `thrust::reduce_by_key` operation requires first doing a `sort_by_key`. This means 2 passes through the input data and `2n` memory overhead. A hash based implementation requires only a single pass and `n/0.7` memory overhead (assuming a 70% load factor).

Trying to implement a reduce-by-key with today's `static_map` implementation is effectively impossible. 

There's two ways you could implement a hash-based rbk (assume the reduction is `sum`).

The first assumes that `insert` returns an iterator to the slot where the inserted occurred, or the slot that contains an equivalent key:
```
template <typename Map, typename Key, typename Value>
hash_rbk(Map m, Key key_begin, Key key_end, Value value_begin){
   auto found = map.insert(key_begin + tid, value_begin + tid); // assume `insert` returns an iterator to the slot where the insert occurred
   
   // key already exists, accumulate with the existing value
   if(found.second){
      auto& found_pair = (found.first)
      found_pair.second.fetch_add(value_begin + tid);
   }
}
```

The second assumes you can do concurrent insert and find:

```
template <typename Map, typename Key, typename Value>
hash_rbk(Map m, Key key_begin, Key key_end, Value value_begin){

   auto found = map.find(key_begin + tid); // look if the key exists
   
   // key already exists, accumulate with the existing value
   if(found != map.end()){
      found.second.fetch_add(value_begin + tid);
   } else { // key doesn't exist
      auto found = map.insert(key_begin + tid, value_begin + tid);
      
      if(not found.second){ // someone else already did the insert, accumulate with their value
           found.first.second.fetch_add(value_begin+tid);
      }
   }
}
```

However, both of these are impossible because:
1. The `insert` function cannot return an iterator (due to the potential for a race condition caused by the back-to-back CAS algorithm), and instead returns a `bool`. 
   - This results from the fact that we use `pair<atomic<K>, atomic<V>>` as the slot storage type and use back to back CAS. If we instead used `atomic<pair<K,V>>` we could return an iterator
2. You cannot do concurrent insert/find operations.
   - This results from the fact that we use `pair<atomic<K>, atomic<V>>` as the slot storage type and use back to back CAS. If we instead used `atomic<pair<K,V>>` we could support concurrent insert and find. 

**Describe the solution you'd like**

Explore a hash map implementation that allows implementing a reduce by key. 

One option is to use a `atomic<pair<K,V>>` implementation, but that has it's own drawbacks. See https://docs.google.com/presentation/d/1_UJlQoDc985sj03grMroB2zcCbiiC7-ua_1Eqm4qrRU/edit?usp=sharing



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Hash map design that supports reduce by key #58

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Hash map design that supports reduce by key #58

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions