Try speeding up weeder with a bloom filter#186
Try speeding up weeder with a bloom filter#186NorfairKing wants to merge 1 commit intoocharles:masterfrom
Conversation
| -- TODO maybe we can make this faster by only hashing the location. | ||
| instance Hashable Declaration where | ||
| hashIO32 d s = hashIO32 (declarationStableName d) s |
There was a problem hiding this comment.
This is definitely not fast, and could be why this approach doesn't work.
There was a problem hiding this comment.
Could be worth trying the Uniquable instance of Module and OccName to produce hashes.
Note [The Unique of an OccName]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
They are efficient, because FastStrings have unique Int# keys. We assume
this key is less than 2^24, and indeed FastStrings are allocated keys
sequentially starting at 0.
So we can make a Unique using
mkUnique ns key :: Unique
where 'ns' is a Char representing the name space. This in turn makes it
easy to build an OccEnv.
-}
(or maybe even better: the Uniquable instance of the original Names we derive the declarations from with nameToDeclaration)
| -- The elem docs say: | ||
| -- @ | ||
| -- If the value is present, return True. If the value is not present, there is still some possibility that True will be returned. | ||
| -- @ | ||
| -- I.e. if some declaration is a weed, it will definitely show up in the result, but also some weeds will show up in the result. | ||
| -- So we need to do another set difference afterwards, but with a much smaller set. | ||
| in Set.difference (Set.filter (not . (`BloomFilter.elem` bloom)) allDecls) usedDecls |
There was a problem hiding this comment.
I'm about 50% sure I got this backward in some way, so that might be part of the issue as well.
|
Another idea that might help make this work: We're now using both a hash and non-hash-based way of putting declarations in a set. If we could re-use the hash of a declaration then we would only need to hash it once and compare only the hash. |
|
It might be even better to avoid hashing altogether and just use Then you just need to worry about the string-y values at the boundaries.
|
The timings are so similar that I'm a bit suspicious about whether anything is happening at all.. |
I had this idea last night so I wanted to try it out, but it looks like this isn't actually faster:
Perhaps this could still work with a better hash function or some more tuning with respect to how the bloom filter is constructed.
@ocharles I figured you might still like to see this, even though the experiment seems to have failed.
Feel free to close this PR