-
Notifications
You must be signed in to change notification settings - Fork 7
Description
It turns out the SourceSet / SourceSetIterator and QueryResultIterator types are more general in their application that just queries. PR #745 renames QueryResultIterator to EntitySetIterator in preparation for a general type EntitySet.
The EntitySet type lifts SourceSets to an algrebra of sets, implementing efficient set operations. These are useful in their own right (e.g. if you want to take the intersection of the results of two queries) but can also be used to implement disjunction (OR), conjunction (AND), and complement (NOT) in queries.
The SourceSet type is an internal type only which wraps the lowest level concrete types that can be interpreted as a set (in the mathematical sense) of entities, even if only abstractly. The fundamental requirements for something to be a SourceSet are:
- ability to test membership:
source_set.contains(entity_id)(virtually always very efficient) - ability to iterate over elements:
source_set.iter().collect<Vec<PersonId>>()(not always efficient) - iteration yields unique entity IDs, i.e. a particular entity ID is yielded at most once. (This turns out to be a natural property—we have it at zero cost for source sets.)
The EntitySet type lifts source sets to an algebra of sets that maintain these three properties.
An initial implementation can be naïve, but a sophisticated implementation will exploit the fact that both set membership and iteration involve a binary decision procedure on sentences of boolean variables (which represent membership tests on the underlying source sets). The theory for optimal evaluation of these sentences is very well developed. (Roughly speaking, we have a Reduced Ordered Binary Decision Diagram, ROBDD.)
It also makes sense for the EntitySet type to replace the concrete IndexSet type used in APIs that give access to result sets (currently only with_query_result).
I have a lot more notes on this that are still evolving.