dine is a Parser Combinator Library targeting Python >=3.10.
- Python 3.10
At the moment, the project has not been published to pypi. You can instead install it directly from github as follows:
$ python -m pip install git+https://github.com/nathan-wien/dine.gitUsing the dine.parser.Parser class, you can create parsing functors. Functors are object that can be called like functions.
When called, each parsing functor (or parser for short) can accept either:
- a builtin
strobject, or - a
dine.parser.Streamobject.
The parser then returns an object of base class dine.result.ParseResult, which can either be:
- A
ParseSuccess(loc, val, rs)object if the parser parses successfully, where:loc: dine.stream.Locationis the parsed location (line and column) in the initial stream,valis the parsed value, andrs: dine.stream.Streamis the remaining stream after applying the parser.
- A
ParseFailure(loc, label, msg)object if the parser fails to parse, where:loc: dine.stream.Locationis the location (line and column) in the initial stream where the parser (first) fails to parse,label: strthe label of the parser that fails to parse, andmsg: stris the error message.
>>> from dine.parser import Parser
# functor that parses a digit
>>> digit_parser = Parser.digit()
>>> digit_parser('42')
ParseSuccess(
loc=(line=1,col=1),
val='4',
rs=Stream("2")
)
>>> digit_parser('hi')
ParseFailure(
loc=(line=1,col=1),
label='digit',
msg="unexpected character 'h'"))
)
# functor that parses a lowercase ASCII character
>>> lowercase_parser = Parser.ascii_lowercase()
>>> lowercase_parser('abc')
ParseSuccess(
loc=(line=1,col=1),
val='a',
rs=Stream("bc")
)
>>> lowercase_parser('ABC')
ParseFailure(
loc=(line=1,col=1),
label='dine.parser.Parser.ascii_lowercase',
msg="unexpected character 'A'"))
)It is not a parser combinator library without the ability of combining parsers together to create more complex parsers.
The following shows some combinators that dine offers. For an exhaustive list of combinator, please refer to the documentation.
>>> from dine.parser import Parser
# apply a parser after the other
>>> Parser.char('a').and_then(Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val=('a', 'b'),
rs=Stream("$")
)
# alternatively
>>> (Parser.char('a') & Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val=('a', 'b'),
rs=Stream("$")
)
# apply another parser if the first one fails
>>> Parser.char('a').or_else(Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val='a',
rs=Stream("b$")
)
# alternatively
>>> (Parser.char('a') | Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val='a',
rs=Stream("b$")
)
# parse 1 or more digits
>>> digits_parser = Parser.digit().many1()
>>> digits_parser('123abc')
ParseSuccess(
loc=(line=1,col=1),
val=['1', '2', '3'],
rs=Stream("abc")
)
# You can convert the parsed value (the `val` field in a `ParsedSuccess` object)
# to anything you want using the `map` method. For example:
>>> num_parser = digits_parser.map(lambda digit_list: int("".join(digit_list)))
>>> num_parser('123abc')
ParseSuccess(
loc=(line=1,col=1),
val=123,
rs=Stream("abc")
)
# Parser that sequences a bunch of parsers, one after the other
>>> abc_parser = Parser.sequence(
... [Parser.char('a'), Parser.char('b'), Parser.char('c')]
... ).set_label('abc_parser')
>>> abc_parser('abc$')
ParseSuccess(
loc=(line=1,col=1),
val=['a', 'b', 'c'],
rs=Stream("$")
)
>>> abc_parser('$')
ParseFailure(
loc=(line=1,col=1),
label='abc_parser',
msg="unexpected character '$'"))
)
# Parser that parses a bunch of alternatives
>>> oneof_abc_parser = Parser.choice(
... [Parser.char('a'), Parser.char('b'), Parser.char('c')]
... ).set_label('oneof_abc_parser')
>>> oneof_abc_parser('c$')
ParseSuccess(
loc=(line=1,col=1),
val='c',
rs=Stream("$")
)
>>> oneof_abc_parser('d$')
ParseFailure(
loc=(line=1,col=1),
label='oneof_abc_parser',
msg="unexpected character 'd'"))
)
# Parsers that throw away things
>>> Parser.char('b').preceded_by(Parser.string("@"))("@b$")
ParseSuccess(
loc=(line=1,col=1),
val='b',
rs=Stream("$")
)
>>> Parser.char('b').succeeded_by(Parser.string("@"))("b@$")
ParseSuccess(
loc=(line=1,col=1),
val='b',
rs=Stream("$")
)
# Parser that parses a list of numbers separated by commas
>>> comma_parser = Parser.char(',')
>>> num_list_parser = num_parser.many1_sep_by(comma_parser)
>>> num_list_parser('5,15,250,1000')
ParseSuccess(
loc=(line=1,col=1),
val=[5, 15, 250, 1000],
rs=Stream("")
)The full documentation can be found here. The documentation will be updated with more details and examples in the future.
- Why is the minimum python version compatible with this library is 3.10?
- The implementation of this library makes heavy use of the structural pattern matching (a.k.a.
matchstatement) feature, which is only available on python 3.10 or later.
- The implementation of this library makes heavy use of the structural pattern matching (a.k.a.
- The COMP4403 course (Compilers and Interpreters) at the University of Queensland.
- Scott Wlaschin's talk on parser combinator and his blog posts on the topic.
- Max Bo's Parser Combinator Talk at UQCS.