-
Notifications
You must be signed in to change notification settings - Fork 65
Description
It would be very useful to have in BQL a FILTER keyword that could allow us to filter out part of the results of a query in a level closer to the storage (closer to the driver), improving performance.
Some of the functionalities that this FILTER keyword could accept:
- Filtering only the triples with immutable predicates for our query result.
The syntax for that could be something like:
FILTER isImmutable(?p)
Coming inside or after the WHERE clause to specify that the predicate bound to ?p in our query should be immutable.
Another function isTemporal could work similarly.
This could come as a solution for what was asked in the Issue #115.
- Filtering only the triples with the latest time anchor for our query result.
It is pretty common to be interested only in the latest triple of a time series. Instead of getting all the triples, sorting them by the time anchor in decreasing order and limiting the result to 1, as one may be doing nowadays, we could just use FILTER to do that in a much less expensive way using a syntax like:
FILTER latest(?p)
This is a pretty common use case already highlighted by the Issues #86 and #85.
It also opens the possibility for supporting the opposite: filtering only the earliest time anchor, as illustrated below.
FILTER earliest(?p)
One could also decide for a syntax that uses directly the time binding for filtering, like in:
FILTER latest(?date)
- Allow using regular expressions for matching.
We could use regex for filtering too. The syntax could be something like:
FILTER match(?obj, "ab+"^^type:text)
Which also resonates with what was asked by the Issue #122.
- Filtering to satisfy comparisons (evaluated as boolean conditions).
For example:
FILTER greaterThan(?obj, "37"^^type:int64)
With the functions lowerThan and equal it should be analogous.
- Filtering to satisfy a combination of functions.
For example, one could write something like:
FILTER latest(lowerThan(?date, 2005-01-02T15:04:05.999999999Z07:00))
To get in the query result only the latest element of a time series while also restricting the time interval to be before a given date.
Another approach for this would be building a function like:
FILTER latestBeforeUpperBound(?date, 2005-01-02T15:04:05.999999999Z07:00)
- Others.
Other ideas for filtering functions could be the likes of:
FILTER isToday(?date)
That would compare a binding with a value extracted during runtime (the current day in this example).
These above are just some examples. The FILTER keyword could open space for a number of other functionalities in the future, as we discover new ones that could be handy and implement them as functions for filtering (just like the functions isImmutable and latest above).
The idea is for the FILTER functions all have a signature like below:
FILTER myFunction(?binding, <value>)
With the <value> argument above being optional (depending on the function it is not necessary, isImmutable does not require it for example).
This way, when adding a new function no new changes will be necessary inside the parser or inside lookupOptions (that communicates with the driver, defined in storage.go). All the FILTER functions should be mapped to three variables there: operation, field (for the binding or its position in the clause) and value.
For other general ideas, one could get inspiration from the SPARQL's FILTER keyword.
N.B.: Note how this FILTER keyword differ from the HAVING: the FILTER would work closer to the storage/driver level to improve query performance while filtering the results, while the HAVING would work focusing on aggregated data in a higher level farther from the driver (as when using functionalities such as sum and count to write your HAVING conditions).