A syntax parser based on the LLLR method.
- Rust 1.56.0 or later
syn <INPUT> -g GRAMMAR [-p lllr] [-o OUTPUT]
The optional argument -o
specifies the desired output file for a graph in the DOT language.
This is only available with the LR parser.
Grammar files are defined using the TOML format.
The header contains the following entries:
name
: Name of the grammar.description
: An optional description of the grammar. Defaults to the canonical path to the grammar file.start_symbol
: Start symbol of the grammar. Defaults to first rule in[rules]
.
Example:
name = "grammar"
description = "Example grammar for README"
start_symbol = "S"
The production rules are described in the [rules]
table. A production can either be a single
string, or an array of strings, each representing the possible rules for the specific grammar
symbol. When parsing the grammar file, a single string is converted to an array with one element.
To represent an ϵ
production, use an empty string. The symbols and rules can be in any order.
Example:
[rules]
# S → A B 'c' | 'a' A B 'b'
S = [
"A B c",
"a A B b",
]
# A → 'a' | ϵ
A = [
"a",
"",
]
# B → 'b'
B = "b"
Regular expressions to match tokens during lexical analysis are described in the [tokens]
table.
The patterns need to be properly escaped and written in a way that allows partial matching for the
incremental lexical analysis. You can specify a list of strings to match with normal text instead.
Matching precedence is defined by the order of the regular expressions.
Example:
[tokens]
a = [
"true",
"false",
]
b = "'[A-Z\\x61-\\x7A_]*('|$)"
c = "[0-9]+"
Regular expressions in the [ignore]
table define tokens that are ignored during syntax analysis.
The patterns need to follow the rules for the [tokens]
table.
Example:
[ignore]
whitespace = "[ \t\r\n]*"
comment = "#.*(\n|$)"
The [actions]
table specifies which action to prefer when a Shift/Reduce conflict occurs. This
avoids issues like the dangling else. Allowed values are shift
and reduce
.
Example:
[actions]
a = "shift"