[FR] Allow inequality conditions in `block_by`

**Is your feature request related to a problem? Please describe.**
It is nice to be able to use `block_by` to filter out some comparisons before computing the string similarity. Currently, it is limited to equality conditions (e.g rows must have the same year to be considered for matching). I have a setting in which I don't want to compare rows if `year.y` is larger than `year.x`, i.e I'd like to block matching by the condition `year.x > year.y`.

I don't know how hard that would be to implement. I could track the usage of `block_by` (renamed `salt` in the internals) until the method `new()` for the `Shingleset` struct in Rust, but I don't know how that works exactly.

**Describe the solution you'd like**
Not sure of the syntax this should take. It could be something similar to `dplyr::join_by()`, so that one could do `block_by = block_by(year.x > year.y)`. Also, that only works if the variable has a different name in each dataset. 

**Describe alternatives you've considered**
Currently I don't use `block_by`. Instead I match on the full data and filter out by my condition afterwards.

**Additional context**
/ 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Allow inequality conditions in `block_by` #100

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FR] Allow inequality conditions in block_by #100

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[FR] Allow inequality conditions in `block_by` #100