Try speeding up weeder with a bloom filter by NorfairKing · Pull Request #186 · ocharles/weeder

NorfairKing · 2024-12-16T10:28:37Z

I had this idea last night so I wanted to try it out, but it looks like this isn't actually faster:

Test with these changes:                       
weeder-check> real      0m32.576s
weeder-check> user      0m30.698s
weeder-check> sys       0m1.788s
 
Test without these changes:
weeder-check> real      0m31.765s
weeder-check> user      0m30.047s
weeder-check> sys       0m1.633s

Perhaps this could still work with a better hash function or some more tuning with respect to how the bloom filter is constructed.

@ocharles I figured you might still like to see this, even though the experiment seems to have failed.
Feel free to close this PR

NorfairKing · 2024-12-16T10:29:11Z

src/Weeder.hs

+-- TODO maybe we can make this faster by only hashing the location.
+instance Hashable Declaration where
+  hashIO32 d s = hashIO32 (declarationStableName d) s


This is definitely not fast, and could be why this approach doesn't work.

Could be worth trying the Uniquable instance of Module and OccName to produce hashes.

Note [The Unique of an OccName] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ They are efficient, because FastStrings have unique Int# keys. We assume this key is less than 2^24, and indeed FastStrings are allocated keys sequentially starting at 0. So we can make a Unique using mkUnique ns key :: Unique where 'ns' is a Char representing the name space. This in turn makes it easy to build an OccEnv. -}

(or maybe even better: the Uniquable instance of the original Names we derive the declarations from with nameToDeclaration)

NorfairKing · 2024-12-16T10:30:00Z

src/Weeder/Run.hs

+      -- The elem docs say:
+      -- @
+      -- If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.
+      -- @
+      -- I.e. if some declaration is a weed, it will definitely show up in the result, but also some weeds will show up in the result.
+      -- So we need to do another set difference afterwards, but with a much smaller set.
+  in Set.difference (Set.filter (not . (`BloomFilter.elem` bloom)) allDecls) usedDecls


I'm about 50% sure I got this backward in some way, so that might be part of the issue as well.

NorfairKing · 2024-12-16T10:35:19Z

Another idea that might help make this work: We're now using both a hash and non-hash-based way of putting declarations in a set. If we could re-use the hash of a declaration then we would only need to hash it once and compare only the hash.

TeofilC · 2024-12-21T12:56:30Z

It might be even better to avoid hashing altogether and just use Ints as keys and IntMap/IntSet. (Maybe from the Unique's that @ryndubei mentions)

Then you just need to worry about the string-y values at the boundaries.

HashMap needs to do Eq checks on the unhashed values if the hashes are equal. So, you can get some savings if you can avoid that.

NorfairKing · 2024-12-23T13:10:02Z

I had this idea last night so I wanted to try it out, but it looks like this isn't actually faster:

The timings are so similar that I'm a bit suspicious about whether anything is happening at all..

no faster, it seems

105d53f

NorfairKing commented Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try speeding up weeder with a bloom filter#186

Try speeding up weeder with a bloom filter#186
NorfairKing wants to merge 1 commit intoocharles:masterfrom
NorfairKing:bloomfilter

NorfairKing commented Dec 16, 2024

Uh oh!

NorfairKing Dec 16, 2024

Uh oh!

ryndubei Dec 20, 2024

Uh oh!

NorfairKing Dec 16, 2024

Uh oh!

NorfairKing commented Dec 16, 2024

Uh oh!

TeofilC commented Dec 21, 2024

Uh oh!

NorfairKing commented Dec 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NorfairKing commented Dec 16, 2024

Uh oh!

NorfairKing Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

ryndubei Dec 20, 2024

Choose a reason for hiding this comment

Uh oh!

NorfairKing Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

NorfairKing commented Dec 16, 2024

Uh oh!

TeofilC commented Dec 21, 2024

Uh oh!

NorfairKing commented Dec 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants