(WIP) Initial work on nimble_parsec generated lexer#448
(WIP) Initial work on nimble_parsec generated lexer#448fishcakez wants to merge 1 commit intopinterest:masterfrom
Conversation
jparise
left a comment
There was a problem hiding this comment.
Should we also scan for (and reject) reserved words?
elixir-thrift/src/thrift_lexer.xrl
Lines 39 to 52 in 1610f61
| |> label("literal") | ||
| end | ||
|
|
||
| defp literal_with(char) do |
There was a problem hiding this comment.
Perhaps quoted_literal(quote_char)?
| |> choice([ | ||
| unsigned_number(), | ||
| empty() | ||
| |> error("expected number") |
There was a problem hiding this comment.
I think this reads a little better as (like you do above):
error(empty(), "expected number")| def identifier() do | ||
| ascii_char([?a..?z, ?A..?Z, ?_]) | ||
| |> repeat(ascii_char([?a..?z, ?A..?Z, ?_, ?0..?9])) | ||
| |> reduce({List, :to_atom, []}) |
There was a problem hiding this comment.
Consider extracting these three lines into their own function because we repeat them below.
| @@ -0,0 +1,189 @@ | |||
| defmodule Thrift.Parser.Nimble do | |||
There was a problem hiding this comment.
Do you think we'll have multiple parsecs in here? Otherwise, just Thrift.Parser.Lexer (lib/thrift/parser/lexer.ex) seems better.
| {:dialyxir, "~> 0.5", only: :dev, runtime: false}, | ||
|
|
||
| # Compile | ||
| {:nimble_parsec, "~> 0.4", |
| |> ignore() | ||
| |> concat( | ||
| choice([ | ||
| utf8_char([?\\]) |> ignore() |> concat(delim), |
There was a problem hiding this comment.
Does this handle embedded newlines, etc. like we support in the current lexer?
elixir-thrift/src/thrift_lexer.xrl
Lines 91 to 105 in 1610f61
|
@jparise I'm curious why those reserved words are reserved. What was the idea behind them? |
This is borrowed from Apache Thrift. The idea is that Thrift IDL shouldn't contain words that would clash with an output language's keywords when code generated. There are two possible approaches:
Apache Thrift generally does (1) (at least for C++'s reserved words), so we copied that behavior in our lexer for compatibility. In practice, we should only be concerned with Elixir's reserved words (which aren't much of a problem) and could take approach (2). In this project, reserved word rejection was added in #295 by @thecodeboss. |
|
Makes sense, it was a bit perplexing to see keywords for other languages in here. |
Initial work on a nimble parsec based lexer. Tried to design for nice errors and to avoid back tracking on happy path.