This repo contains a series of tools for building compilers, including lexer, parser (LL, LR(0), LR(1)), and debugger. This library aims to keep the complexity at a low level and provide intuitive interfaces, meaning that it is quite theoretical rather than practical.
#lang racket
(require basic-cc/language)
(define-language s-exp
"\\s+" ; Ignore characters matching this pattern
(L "\\(") (R "\\)") (SYM "\\w+") ; Name of token must be [A-Z]+
(form SYM (L items R) (L R)) ; form -> SYM | L items R | L R
(items form (items form)))
(define source "(flatten (() (1 (2))))")
(s-exp-cut source)
;; #(#<L: "("> #<SYM: "flatten"> #<L: "("> #<L: "("> #<R: ")">
;; #<L: "("> #<SYM: "1"> #<L: "("> #<SYM: "2"> #<R: ")">
;; #<R: ")"> #<R: ")"> #<R: ")"> #<EOF: #f>)
(s-exp-read source)
;; (form #<L: "(">
;; (items
;; (items (form #<SYM: "flatten">))
;; (form #<L: "(">
;; (items
;; (items (form #<L: "("> #<R: ")">))
;; (form #<L: "(">
;; (items (items (form #<SYM: "1">))
;; (form #<L: "(">
;; (items (form #<SYM: "2">))
;; #<R: ")">))
;; #<R: ")">))
;; #<R: ")">))
;; #<R: ")">)The macro define-language will define a set of variables and functions to accomplish syntactic parsing. As shown above, s-exp-cut and s-exp-read play important roles in this process. Both of them have the same signature:
(language-cut/read in #:file filename)where in is either a input-port or a string and filename is a string pointing to source of the input as metadata. Besides, define-language defines the following variables, though you can ignore them completely.
s-exp/lexiconInternal lexer objects-exp/grammarClassical context-free grammar defined by 4-tuples-exp/automatonLR(1) automaton derived froms-exp/grammars-exp/tableGeneral LR table for syntax parsings-exp/language4-tuple of variables above
Three options are provided for further control:
(#:enable-EOL)Generate EOL token for newline (default not)(#:allow-conflict)Prevent the compiler compiler from raising exception when syntactic conflicts are found(#:driver driver)Specify the parsing method, where driver isLR.0orLR.1
A bigger example can be found here.