Skip to content

Performance of (c/plus (c/alt ...)) combinator vs (c/regexp #"[...]+") #207

@theronic

Description

@theronic

I'm seeing an 80x-120x performance slowdown using the (c/plus (c/alt ...)) combinator to parse a string of special characters as compared to regex, (c/regexp #"[\a\b\c...]+").

In comparison, matching on the same string using Clojure Spec2 Alpha with (s/+ #{\a \b \c ...}) is only 20x slower than the regexp.

The combinators are so much more readable, composable and less error-prone than writing string-based regular expressions, so I'd love to use them, but the performance difference is significant.

Is there a way to speed up a set match, or write a manual matcher? It would be great if I could pass Instaparse any function and it eats whatever the function returns, unless it's a special :instaparse/invalid value.

Here are my Criterium benchmarks (done on i9 MBP16):

(let [test-input        "~|'[@:];|;{{}=:;<];~[^|{?"
       charset           #{\: \; \< \= \> \? \@,
                            \[ \] \\ \^ \_ \',
                            \{ \| \} \~}
      test-spec         (s/+ charset)                     ;; (aside: Spec2 will moan about fully-qualified symbol #bug)
      combinator-parser (insta/parser
                            {:S (c/plus (->> charset
                                          (map (comp c/unicode-char int))
                                          (apply c/alt)))}
                            :start :S)
      regex-parser      (insta/parser
                            {:S (regexp #"[\:\;\<=\>\?\@\[\]\\\^\_\'\{\|\}\~]+")} :start :S)]
    (crit/quick-bench                                       ;; uncomment ones we don't care about
      (s/conform test-spec (seq test-input)) ;; mean: 117.3us
      (insta/parse combinator-parser test-input) ;; mean: 614us
      (insta/parse regex-parser test-input))) ;; mean: 5.43us

(I also tried using the defparser macro, with similiar results.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions