Skip to content

[Semantics/Pedagogy] "PEG operators" or "boolean grammar operators"? #239

@eternaleye

Description

@eternaleye

The grammar language of instaparse has been extended with three beyond-CFG operators:

  • &, currently described as "lookahead"
  • !, currently described as "negative lookahead"
  • /, currently described as "ordered choice"

Furthermore, it explains them as being PEG extensions to CFG. However, while PEG is a popular parsing paradigm, it does suffer from a shortage of formal study into its behavior, such as its closure properties.

However, two of the three operators also occur (with slightly different pedagogy) in a grammar family with clearer formal properties: the "boolean grammars", largely studied by Alexander Okhotin. This extends the context-free grammars with intersection and negation, and several nice properties have been proven about it:

  • It is closed under union, concatenation, and Kleene star, like CFGs
    • However, closure under rational transduction (lexing) seems as-yet unproven
  • It is also closed under intersection and negation
  • Its worst-case parsing bound is O(|G|n³), the same as nondeterministic CFG

As a result, it may make sense to separate the treatment of & and ! from the treatment of /, as they rest upon somewhat firmer formal ground.

Furthermore, / can be understood as a particular disambiguation strategy applied to the parse forest generated when using |; as a result, it does not necessarily increase the expressive power of the system due to instaparse actually generating a parse forest, unlike many other parser frameworks. Being able to express more nuanced disambiguation strategies as part of the grammar itself may offer valuable opportunities to streamline (and make more efficient) the experience of parsing ambiguous documents, and could make / merely a convenient syntax over a more uniform foundation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions