The flex-header work shows that most of the cycles for compilation are spent finding NTs by name and by XT. That work also makes things a bit worse since walking that linked list gets a bit slower.
It seems that a very simple random-replacement cache of recently used words could make a huge improvement. It would need to invalidate on a few wordlist changes, marker etc.
I'm curious to test the idea and see what the cost (code size, added complexity) vs benefit (speed) looks like, and how it varies with code size.