You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on #1587 I noticed that Instruments is showing Token::make_word as the second hottest single function, right after alloc::raw_vec::finish_grow.
Looking into the implementation I saw that its just doing a binary search across all keywords to find if its a known keyword or not. This is a fairly classical case where we have a known set of strings and want to check if a given string is in that list. There are a bunch of ways that we could speed this up. This issue is to figure out a good compromise between those possible speedups and other project constraints like maintaining a no_std ability.
My first approach at speeding this up was to create a table for the first byte in every keyword to reduce the number of entries that need to be searched. This small optimization managed to shave off about 400ms of time (of the 1.4ish seconds total).
However, there are other approaches that could speed this up even more. Either by generating parsing/lookup tables or using something like phf to do the heavy lifting for us.
The text was updated successfully, but these errors were encountered:
I don't know if this is useful, but I remember watching a video from Strager: Perfect Hash Tables. He had the same kind of problem and created a "custom" hash table by using the information he already has, the known keywords
In short, because he knew which words are keywords, he built his hash table around that by combining the first 2 and last 2 bytes and hashing that. And after that still compare. (This is of the top of my head)
I did notice there are about ~800 keywords, so I don't know if it's feasible, but maybe it's worth looking into it?
While working on #1587 I noticed that Instruments is showing
Token::make_word
as the second hottest single function, right afteralloc::raw_vec::finish_grow
.Looking into the implementation I saw that its just doing a binary search across all keywords to find if its a known keyword or not. This is a fairly classical case where we have a known set of strings and want to check if a given string is in that list. There are a bunch of ways that we could speed this up. This issue is to figure out a good compromise between those possible speedups and other project constraints like maintaining a
no_std
ability.My first approach at speeding this up was to create a table for the first byte in every keyword to reduce the number of entries that need to be searched. This small optimization managed to shave off about 400ms of time (of the 1.4ish seconds total).
However, there are other approaches that could speed this up even more. Either by generating parsing/lookup tables or using something like phf to do the heavy lifting for us.
The text was updated successfully, but these errors were encountered: