Skip to content

EBNF "special sequence" causes parse error #236

@pmonks

Description

@pmonks

An EBNF "special sequence" (? ... ?) does not appear to be parsed correctly by instaparse. For example, this EBNF fails with an error on line 162, at column 28 (the first question mark character that starts the first of two special sequences).

Steps to reproduce

  1. Start a REPL with instaparse in the classpath
  2. Run this code:
(require '[clojure.string :as str])
(require '[instaparse.core :as ip])

(def ebnf-with-examples (slurp "https://raw.githubusercontent.com/dariusz-wozniak/fuzzy-dates/refs/heads/main/grammar/fuzzy-date.ebnf"))
(def ebnf-only (first (str/split ebnf-with-examples #"\Q(* --- Examples --- *)\E")))
(def p (ip/parser ebnf-only))

Expected result

p contains an instaparse parser for this EBNF grammar.

Actual result

Execution error at instaparse.util/throw-runtime-exception (util.clj:7).
Error parsing grammar specification:
Parse error at line 162, column 28:
character_in_calendar_id = ? any printable character except ')' and newline ? ;
                           ^
Expected one of:
!
&
ε
eps
EPSILON
epsilon
Epsilon
<
(
{
[
#"#\"[^\"\\]*(?:\\.[^\"\\]*)*\"(?x) #Double-quoted regexp"
#"#'[^'\\]*(?:\\.[^'\\]*)*'(?x) #Single-quoted regexp"
#"\"[^\"\\]*(?:\\.[^\"\\]*)*\"(?x) #Double-quoted string"
#"'[^'\\]*(?:\\.[^'\\]*)*'(?x) #Single-quoted string"
(*
#"[^, \r\t\n<>(){}\[\]+*?:=|'"#&!;./]+(?x) #Non-terminal"

Other considerations

I realise that properly supporting EBNF special sequences opens a can of worms around how their contents are to be interpreted, but at a minimum a more specific error message would be valuable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions