-
Notifications
You must be signed in to change notification settings - Fork 151
Description
I'm seeing an 80x-120x performance slowdown using the (c/plus (c/alt ...)) combinator to parse a string of special characters as compared to regex, (c/regexp #"[\a\b\c...]+").
In comparison, matching on the same string using Clojure Spec2 Alpha with (s/+ #{\a \b \c ...}) is only 20x slower than the regexp.
The combinators are so much more readable, composable and less error-prone than writing string-based regular expressions, so I'd love to use them, but the performance difference is significant.
Is there a way to speed up a set match, or write a manual matcher? It would be great if I could pass Instaparse any function and it eats whatever the function returns, unless it's a special :instaparse/invalid value.
Here are my Criterium benchmarks (done on i9 MBP16):
(let [test-input "~|'[@:];|;{{}=:;<];~[^|{?"
charset #{\: \; \< \= \> \? \@,
\[ \] \\ \^ \_ \',
\{ \| \} \~}
test-spec (s/+ charset) ;; (aside: Spec2 will moan about fully-qualified symbol #bug)
combinator-parser (insta/parser
{:S (c/plus (->> charset
(map (comp c/unicode-char int))
(apply c/alt)))}
:start :S)
regex-parser (insta/parser
{:S (regexp #"[\:\;\<=\>\?\@\[\]\\\^\_\'\{\|\}\~]+")} :start :S)]
(crit/quick-bench ;; uncomment ones we don't care about
(s/conform test-spec (seq test-input)) ;; mean: 117.3us
(insta/parse combinator-parser test-input) ;; mean: 614us
(insta/parse regex-parser test-input))) ;; mean: 5.43us
(I also tried using the defparser macro, with similiar results.)