Skip to main content

module ParseTree

rascal-0.40.16

Library functions for parse trees.

Usage

import ParseTree;

Dependencies

extend Type;
extend Message;
extend List;

Description

A concrete syntax tree or parse tree is an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar.

Most Rascal users will encounter parse trees in the form of concrete values. Expert users may find the detailed description here useful when writing generic functions on parse trees.

In Rascal parse trees, the interior nodes are labeled by rules of the grammar, while the leaf nodes are labeled by terminals (characters) of the grammar.

Tree is the universal parse tree data type in Rascal and can be used to represent parse trees for any language.

  • Tree is a subtype of the type node
  • All types (non-terminals) declared in syntax, lexical, layout and keyword definitions are sub-types of Tree.
  • All concrete syntax expressions produce parse trees with a type corresponding to a non-terminal.
  • Trees can be annotated in various ways. Most importantly the \loc annotation always points to the source location of any (sub) parse tree.

Advanced users may want to create tools that analyze any parse tree, regardless of the syntax definition that generated it, you can manipulate them on the abstract level.

A parse tree is of type Tree using the auxiliary types Production, Symbol, Condition, Attr, Associativity, CharRange. Effectively, a parse tree is a nested tree structure of type Tree.

  • Most internal nodes are applications (appl) of a Production to a list of children Tree nodes. Production is the abstract representation of a rule in a syntax definition. which consists of a definition of an alternative for a Symbol by a list of Symbols.

  • The leaves of a parse tree are always characters (char), which have an integer index in the UTF8 table.

  • Some internal nodes encode ambiguity (amb) by pointing to a set of alternative Tree nodes.

The Production and Symbol types are an abstract notation for rules in syntax definitions, while the Tree type is the actual notation for parse trees.

Parse trees are called parse forests when they contain amb nodes.

You can analyze and manipulate parse trees in three ways:

  • Directly on the Tree level, just like any other algebraic data type.
  • Using concrete syntax expressions and concrete syntax patterns.
  • Using disambiguation actions (parameters of the parse function)

The type of a parse tree is the symbol that it's production produces, i.e. appl(prod(sort("A"),[],{}),[]) has type A. Ambiguity nodes Each such a non-terminal type has Tree as its immediate super-type.

Examples

rascal>import ParseTree;
ok
rascal>syntax A = "a";
ok

will make the following succeed:

rascal>t = parse(#A,"a");
A: (A) `a`
rascal>t := appl(
>>>>>>> prod(
>>>>>>> sort("A"),
>>>>>>> [lit("a")],
>>>>>>> {}),
>>>>>>> [appl(
>>>>>>> prod(
>>>>>>> lit("a"),
>>>>>>> [\char-class([range(97,97)])],
>>>>>>> {}),
>>>>>>> [char(97)])]);
bool: true

You see that the defined non-terminal A ends up as the production for the outermost node. As the only child is the tree for recognizing the literal a, which is defined to be a single a from the character-class [ a ].

When we use labels in the definitions, they also end up in the trees. The following definition

rascal>import ParseTree;
ok
rascal>lexical B= myB:"b";
ok
rascal>lexical C = myC:"c" B bLabel;
ok

Will make the following succeed:

rascal>t = parse(#C,"cb");
C: (C) `cb`
rascal>t := appl(
>>>>>>> prod(
>>>>>>> label(
>>>>>>> "myC",
>>>>>>> lex("C")),
>>>>>>> [
>>>>>>> lit("c"),
>>>>>>> label(
>>>>>>> "bLabel",
>>>>>>> lex("B"))
>>>>>>> ],
>>>>>>> {}),
>>>>>>> [appl(
>>>>>>> prod(
>>>>>>> lit("c"),
>>>>>>> [\char-class([range(99,99)])],
>>>>>>> {}),
>>>>>>> [char(99)]),appl(
>>>>>>> prod(
>>>>>>> label(
>>>>>>> "myB",
>>>>>>> lex("B")),
>>>>>>> [lit("b")],
>>>>>>> {}),
>>>>>>> [appl(
>>>>>>> prod(
>>>>>>> lit("b"),
>>>>>>> [\char-class([range(98,98)])],
>>>>>>> {}),
>>>>>>> [char(98)])])]);
bool: true

Here you see that the alternative name is a label around the first argument of prod while argument labels become labels in the list of children of a prod.

Benefits

  • Parse trees have all the necessary information in them for high-fidelity source code analysis and transformation
  • Parse trees contain full definitions of the grammar rules that it applies
  • Parse trees can always be "unparsed" to source text again

Pitfalls

  • For historical reasons the name of the annotation is "loc" and this interferes with the Rascal keyword loc for the type of source locations. Therefore the annotation name has to be escaped as \loc when it is declared or used.
  • We are in transition from deprecating the annotation @\loc with the keyword field .src=|unknown:///|. Currently the run-time already uses .src while the source code still uses @\loc.

data Tree

The Tree data type as produced by the parser.

data Tree  
= appl(Production prod, list[Tree] args)
| cycle(Symbol symbol, int cycleLength)
| amb(set[Tree] alternatives)
| char(int character)
;

A Tree defines the trees normally found after parsing; additional constructors exist for execptional cases:

  • ❶ Parse tree constructor when parse succeeded.
  • ❷ Cyclic parsetree.
  • ❸ Ambiguous subtree.
  • ❹ A single character.

data Production

Production in ParseTrees.

data Production  
= prod(Symbol def, list[Symbol] symbols, set[Attr] attributes)
| regular(Symbol def)
;

The type Production is introduced in Type, see Production. Here we extend it with the symbols that can occur in a ParseTree. We also extend productions with basic combinators allowing to construct ordered and un-ordered compositions, and associativity groups.

  • ❶ A prod is a rule of a grammar, with a defined non-terminal, a list of terminal and/or non-terminal symbols and a possibly empty set of attributes.

  • ❷ A regular is a regular expression, i.e. a repeated construct.

  • priority means operator precedence, where the order of the list indicates the binding strength of each rule;

  • assoc means all alternatives are acceptable, but nested on the declared side;

  • reference means a reference to another production rule which should be substituted there, for extending priority chains and such.

data Production

data Production  
= \priority(Symbol def, list[Production] choices)
| \associativity(Symbol def, Associativity \assoc, set[Production] alternatives)
| \reference(Symbol def, str cons)
;

data Attr

Attributes in productions.

data Attr  
= \bracket()
| \assoc(Associativity \assoc)
;

An Attr (attribute) documents additional semantics of a production rule. Neither tags nor brackets are processed by the parser generator. Rather downstream processors are activated by these. Associativity is a parser generator feature though.

data Associativity

Associativity attribute.

data Associativity  
= \left()
| \right()
| \assoc()
| \non-assoc()
;

Associativity defines the various kinds of associativity of a specific production.

data CharRange

Character ranges and character class.

data CharRange  
= range(int begin, int end)
;
  • CharRange defines a range of characters.
  • A CharClass consists of a list of characters ranges.

alias CharClass

list[CharRange]

data Symbol

Symbols that can occur in a ParseTree.

data Symbol  
= \start(Symbol symbol)
;

The type Symbol is introduced in Type, see Symbol, to represent the basic Rascal types, e.g., int, list, and rel. Here we extend it with the symbols that may occur in a ParseTree.

  • ❶ The start symbol wraps any symbol to indicate that it is a start symbol of the grammar and may occur at the root of a parse tree.
  • ❷ Context-free non-terminal
  • ❸ Lexical non-terminal
  • ❹ Layout symbols
  • ❺ Terminal symbols that are keywords
  • ❻ Parameterized context-free non-terminal
  • ❼ Parameterized lexical non-terminal
  • ❽ Terminal.
  • ❾ Case-insensitive terminal.
  • ❶⓿ Character class
  • ❶❶ Empty symbol
  • ❶❷ Optional symbol
  • ❶❸ List of one or more symbols without separators
  • ❶❹ List of zero or more symbols without separators
  • ❶❺ List of one or more symbols with separators
  • ❶❻ List of zero or more symbols with separators
  • ❶❼ Alternative of symbols
  • ❶❽ Sequence of symbols
  • ❶❾ Conditional occurrence of a symbol.

data Symbol

data Symbol  
= \sort(str name)
| \lex(str name)
| \layouts(str name)
| \keywords(str name)
| \parameterized-sort(str name, list[Symbol] parameters)
| \parameterized-lex(str name, list[Symbol] parameters)
;

data Symbol

data Symbol  
= \lit(str string)
| \cilit(str string)
| \char-class(list[CharRange] ranges)
;

data Symbol

data Symbol  
= \empty()
| \opt(Symbol symbol)
| \iter(Symbol symbol)
| \iter-star(Symbol symbol)
| \iter-seps(Symbol symbol, list[Symbol] separators)
| \iter-star-seps(Symbol symbol, list[Symbol] separators)
| \alt(set[Symbol] alternatives)
| \seq(list[Symbol] symbols)
;

data Symbol

data Symbol  
= \conditional(Symbol symbol, set[Condition] conditions)
;

function subtype

bool subtype(Symbol::\sort(_), Symbol::\adt("Tree", _))

data Condition

Datatype for declaring preconditions and postconditions on symbols.

data Condition  
= \follow(Symbol symbol)
| \not-follow(Symbol symbol)
| \precede(Symbol symbol)
| \not-precede(Symbol symbol)
| \delete(Symbol symbol)
| \at-column(int column)
| \begin-of-line()
| \end-of-line()
| \except(str label)
;

A Condition can be attached to a symbol; it restricts the applicability of that symbol while parsing input text. For instance, follow requires that it is followed by another symbol and at-column requires that it occurs at a certain position in the current line of the input text.

function priority

Nested priority is flattened.

Production priority(Symbol s, [*Production a, priority(Symbol _, list[Production] b), *Production c])

function associativity

Normalization of associativity.

Production associativity(Symbol s, Associativity as, {*Production a, choice(Symbol t, set[Production] b)})

Production associativity(Symbol rhs, Associativity a, {associativity(rhs, Associativity b, set[Production] alts), *Production rest})

Production associativity(Symbol s, Associativity as, {*Production a, priority(Symbol t, list[Production] b)})
  • The Choice constructor under associativity is flattened.
  • Nested (equal) associativity is flattened.
  • Priority under an associativity group defaults to choice.

function parse

Parse input text (from a string or a location) and return a parse tree.

&T<:Tree parse(type[&T<:Tree] begin, str input, bool allowAmbiguity=false, bool hasSideEffects=false, set[Tree(Tree)] filters={})

&T<:Tree parse(type[&T<:Tree] begin, str input, loc origin, bool allowAmbiguity=false, bool hasSideEffects=false, set[Tree(Tree)] filters={})

&T<:Tree parse(type[&T<:Tree] begin, loc input, bool allowAmbiguity=false, bool hasSideEffects=false, set[Tree(Tree)] filters={})
  • Parse a string and return a parse tree.
  • Parse a string and return a parse tree, origin defines the original location of the input.
  • Parse the contents of resource input and return a parse tree.

The parse either throws ParseError exceptions or returns parse trees of type Tree. See [[ParseTree]].

The allowAmbiguity flag dictates the behavior of the parser in the case of ambiguity. When allowAmbiguity=true the parser will construct ambiguity clusters (local sets of parse trees where the input string is ambiguous). If it is false the parser will throw an Ambiguous exception instead. An Ambiguous exception is comparable to a ParseError exception then. The latter option terminates much faster, i.e. always in cubic time, and always linear in the size of the intermediate parse graph, while constructing ambiguous parse forests may grow to O(n^p+1), where p is the length of the longest production rule and n is the length of the input.

The filters set contains functions which may be called optionally after the parse algorithm has finished and just before the Tree representation is built. The set of functions contain alternative functions, only on of them is successfully applied to each node in a tree. If such a function fails to apply, the other ones are tried. There is no fixed-point computation, so composed filters must be added to the set of filters programmatically. Post-parse filtering is best done at this stage and not later on the Tree representation for efficiency reasons. Namely, the size of the parse graph before Tree construction is still cubic due to "binarized" sharing of intermediate nodes, while after Tree construction the forest may obtain a size in O(n^p+1) where n is the length of the input and p is the length of the longest syntax rule. Filtering using the filters parameter, on the other hand, may very well cut the forest quickly down to even a linear size and result in an efficient overall parsing algorithm.

The hasSideEffects flag is normally set to false. When the filters functions have side-effects to remove ambiguity, this option must be set to true to ensure correct behavior. A side-effect of filter functions is typically the construction of a symbol table and the removal (see [[Statements/Filter]]) of syntax trees which refer to undefined symbols. In such a case hasSideEffects must be set to true for correctness' sake. If its set to false then the algorithm assumes tree construction is context-free and it can memoize the results of shared intermediate graph nodes. The tree construction algorithm is effectively always worst case polynomial in O(n^p+1) --p being the length of the longest syntax rule-- when hasSideEffects is true, but may be linear when set to false. So this is quite an important flag to consider.

Examples

rascal>lexical Number = [0-9]+;
ok
rascal>syntax Exp
>>>>>>> = Number
>>>>>>> | left Exp "+" Exp
>>>>>>> ;
ok
rascal>
rascal>import ParseTree;
ok

Seeing that parse returns a parse tree:

rascal>parse(#Exp, "2+3");
Exp: (Exp) `2+3`

Catching a parse error:

rascal>import IO;
ok
rascal>try {
>>>>>>> Exp e = parse(#Exp, "2@3");
>>>>>>>}
>>>>>>>catch ParseError(loc l): {
>>>>>>> println("I found a parse error at line <l.begin.line>, column <l.begin.column>");
>>>>>>>}
I found a parse error at line 1, column 1
ok

function parser

Generates a parser from an input grammar.

&T (value input, loc origin) parser(type[&T] grammar, bool allowAmbiguity=false, bool hasSideEffects=false, set[Tree(Tree)] filters={})

This builtin function wraps the Rascal parser generator by transforming a grammar into a parsing function.

The resulting parsing function has the following overloaded signature:

  • Tree parse(str input, loc origin);
  • Tree parse(loc input, loc origin);

So the parse function reads either directly from a str or via the contents of a loc. It also takes a origin parameter which leads to the prefix of the src fields of the resulting tree.

The parse function behaves differently depending of the given keyword parameters:

 *  `allowAmbiguity`: if true then no exception is thrown in case of ambiguity and a parse forest is returned. if false,
the parser throws an exception during tree building and produces only the first ambiguous subtree in its message.
if set to `false`, the parse constructs trees in linear time. if set to `true` the parser constructs trees in polynomial time.
*
* `hasSideEffects`: if false then the parser is a lot faster when constructing trees, since it does not execute the parse _actions_ in an
interpreted environment to make side effects (like a symbol table) and it can share more intermediate results as a result.

function firstAmbiguityFinder

Generates a parser function that can be used to find the left-most deepest ambiguous sub-sentence.

Tree (value input, loc origin) firstAmbiguityFinder(type[Tree] grammar, bool hasSideEffects=false, set[Tree(Tree)] filters={})

Benefits

  • Instead of trying to build a polynomially sized parse forest, this function only builds the smallest part of the tree that exhibits ambiguity. This can be done very quickly, while the whole forest could take minutes to hours to construct.
  • Use this function for ambiguity diagnostics and regression testing for ambiguity.

Pitfalls

  • The returned sub-tree usually has a different type than the parameter of the type[] symbol that was passed in. The reason is that sub-trees typically have a different non-terminal than the start non-terminal of a grammar.

function parsers

Generates parsers from a grammar (reified type), where all non-terminals in the grammar can be used as start-symbol.

&U (type[&U] nonterminal, value input, loc origin) parsers(type[&T] grammar, bool allowAmbiguity=false, bool hasSideEffects=false,  set[Tree(Tree)] filters={})

This parser generator behaves the same as the parser function, but it produces parser functions which have an additional nonterminal parameter. This can be used to select a specific non-terminal from the grammar to use as start-symbol for parsing.

function firstAmbiguityFinders

Generates a parser function that can be used to find the left-most deepest ambiguous sub-sentence.

Tree (type[Tree] nonterminal, value input, loc origin) firstAmbiguityFinders(type[Tree] grammar, bool hasSideEffects=false,  set[Tree(Tree)] filters={})

Benefits

  • Instead of trying to build a polynomially sized parse forest, this function only builds the smallest part of the tree that exhibits ambiguity. This can be done very quickly, while the whole forest could take minutes to hours to construct.
  • Use this function for ambiguity diagnostics and regression testing for ambiguity.

Pitfalls

  • The returned sub-tree usually has a different type than the parameter of the type[] symbol that was passed in. The reason is that sub-trees typically have a different non-terminal than the start non-terminal of a grammar.

function firstAmbiguity

Parse the input but instead of returning the entire tree, return the trees for the first ambiguous substring.

Tree firstAmbiguity(type[Tree] begin, str input)

Tree firstAmbiguity(type[Tree] begin, loc input)

This function is similar to the Parse function in its functionality. However, in case of serious ambiguity parse could be very slow. This function is much faster, because it does not try to construct an entire forest and thus avoids the cost of constructing nested ambiguity clusters.

If the input sentence is not ambiguous after all, simply the entire tree is returned.

function storeParsers

Generate a parser and store it in serialized form for later reuse.

void storeParsers(type[Tree] grammar, loc saveLocation)

The stored parsers would be able to be recovered later using Load Parsers.

For any concrete grammar, a.k.a. reified syntax type, g it holds that:

  • after storeParsers(g, file);
  • then g = loadParsers(file);
  • and given h = parsers(g);
  • then for all valid nt, input and origin: g(nt, input, origin) == h(nt, input, origin)

In other words, a loaded parser function behaves exactly as a freshly generated parser for the same grammar, if (and only if) it was stored for the same grammar value.

Benefits

  • storing parsers is to cache the work of reifing a grammar, and generating a parser from that grammar.
  • stored parsers are nice for deployment scenerios where the language is fixed and efficiency is appreciated.

Pitfalls

  • caching parsers with storeParsers is your responsibility; the cache is not cleaned automatically when the grammar changes.

function loadParsers

Load a previously serialized parser from disk for usage.

&U (type[&U] nonterminal, value input, loc origin) loadParsers(loc savedParsers, bool allowAmbiguity=false, bool hasSideEffects=false, set[Tree(Tree)] filters={})

For any concrete grammar, a.k.a. reified syntax type, g it holds that:

  • after storeParsers(g, file);
  • then g = loadParsers(file);
  • and given h = parsers(g);
  • then for all valid nt, input and origin: g(nt, input, origin) == h(nt, input, origin)

In other words, a loaded parser function behaves exactly as a freshly generated parser for the same grammar, if (and only if) it was stored for the same grammar value.

Examples

First we store a parser:

rascal>import ParseTree;
ok
rascal>syntax E = E "+" E | "e";
ok
rascal>storeParsers(#E, |memory://test-tmp/E.parsers|)
ok

Here we show a new shell does not even know about the grammar:

rascal>#E
|prompt:///|(1,1,<1,1>,<1,2>): Undeclared type: E
Advice: |https://www.rascal-mpl.org/docs/Rascal/Errors/CompileTimeErrors/UndeclaredType|
ok

Then in a next run, we load the parser and use it:

rascal>import ParseTree;
ok
rascal>p = loadParsers(|memory://test-tmp/E.parsers|);
&U (type[&U], value, loc): function(|file:///home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/ParseTree.rsc|(23355,2,<538,164>,<538,166>))
rascal>p(type(sort("E"), ()), "e+e", |src:///|);
&U<:Tree: appl(
prod(
sort("E"),
[
sort("E"),
layouts("$default$"),
lit("+"),
layouts("$default$"),
sort("E")
],
{}),
[appl(
prod(
sort("E"),
[lit("e")],
{}),
[appl(
prod(
lit("e"),
[\char-class([range(101,101)])],
{}),
[char(101)])],
src=|src:///|(0,1,<1,0>,<1,1>)),appl(
prod(
layouts("$default$"),
[],
{}),
[],
src=|src:///|(1,0,<1,1>,<1,1>)),appl(
prod(
lit("+"),
[\char-class([range(43,43)])],
{}),
[char(43)]),appl(
prod(
layouts("$default$"),
[],
{}),
[],
src=|src:///|(2,0,<1,2>,<1,2>)),appl(
prod(
sort("E"),
[lit("e")],
{}),
[appl(
prod(
lit("e"),
[\char-class([range(101,101)])],
{}),
[char(101)])],
src=|src:///|(2,1,<1,2>,<1,3>))],
src=|src:///|(0,3,<1,0>,<1,3>))

Benefits

  • loaded parsers can be used immediately without the need of loadig and executing a parser generator.

Pitfalls

  • reifiying types (use of #) will trigger the loading of a parser generator anyway. You have to use this notation for types to avoid that: type(\start(sort("MySort")), ()) to avoid the computation for #start[A]

function loadParser

Load a previously serialized parser, for a specific non-terminal, from disk for usage.

&U (value input, loc origin) loadParser(type[&U] nonterminal, loc savedParsers, bool allowAmbiguity=false, bool hasSideEffects=false, set[Tree(Tree)] filters={})

This loader behaves just like Load Parsers, except that the resulting parser function is already bound to a specific non-terminal.

function unparse

Yield the string of characters that form the leafs of the given parse tree.

str unparse(Tree tree)

unparse is the inverse function of Parse, i.e., for every syntactically correct string TXT of type S, the following holds:

unparse(parse(#S, TXT)) == TXT

Examples

rascal>syntax Exp = Exp "+" Exp | Number;
ok
rascal>lexical Number = [0-9]+;
ok
rascal>import ParseTree;
ok

First parse an expression, this results in a parse tree. Then unparse this parse tree:

rascal>unparse(parse(#Exp, "2+3"));
str: "2+3"
---
2+3
---

function printSymbol

str printSymbol(Symbol sym, bool withLayout)

function implode

Implode a parse tree according to a given (ADT) type.

&T<:value implode(type[&T<:value] t, Tree tree)

Given a grammar for a language, its sentences can be parsed and the result is a parse tree (or more precisely a value of type Tree). For many applications this is sufficient and the results are achieved by traversing and matching them using concrete patterns.

In other cases, the further processing of parse trees is better done in a more abstract form. The abstract syntax for a language is a data type that is used to represent programs in the language in an abstract form. Abstract syntax has the following properties:

  • It is "abstract" in the sense that it does not contain textual details such as parentheses, layout, and the like.
  • While a language has one grammar (also known as, concrete syntax) it may have several abstract syntaxes for different purposes: type analysis, code generation, etc.

The function implode bridges the gap between parse tree and abstract syntax tree. Given a parse tree and a Rascal type it traverses them simultaneously and constructs an abstract syntax tree (a value of the given type) as follows:

  • Literals, layout and empty (i.e. ()) nodes are skipped.

  • Regular */+ lists are imploded to lists or sets depending on what is expected in the ADT.

  • Ambiguities are imploded to sets.

  • If the expected type is str the tree is unparsed into a string. This happens for both lexical and context-free parse trees.

  • If a tree's production has no label and a single AST (i.e. non-layout, non-literal) argument (for instance, an injection), the tree node is skipped, and implosion continues with the lone argument. The same applies to bracket productions, even if they are labeled.

  • If a tree's production has no label, but more than one argument, the tree is imploded to a tuple (provided this conforms to the ADT).

  • Optionals are imploded to booleans if this is expected in the ADT. This also works for optional literals, as shown in the example below.

  • An optional is imploded to a list with zero or one argument, iff a list type is expected.

  • If the argument of an optional tree has a production with no label, containing a single list, then this list is spliced into the optional list.

  • For trees with (cons-)labeled productions, the corresponding constructor in the ADT corresponding to the non-terminal of the production is found in order to make the AST.

  • If the provided type is node, (cons-)labeled trees will be imploded to untyped nodes. This means that any subtrees below it will be untyped nodes (if there is a label), tuples of nodes (if a label is absent), and strings for lexicals.

  • Unlabeled lexicals are imploded to str, int, real, bool depending on the expected type in the ADT. To implode lexical into types other than str, the PDB parse functions for integers and doubles are used. Boolean lexicals should match "true" or "false". NB: lexicals are imploded this way, even if they are ambiguous.

  • If a lexical tree has a cons label, the tree imploded to a constructor with that name and a single string-valued argument containing the tree's yield.

An IllegalArgument exception is thrown if during implosion a tree is encountered that cannot be imploded to the expected type in the ADT. As explained above, this function assumes that the ADT type names correspond to syntax non-terminal names, and constructor names correspond to production labels. Labels of production arguments do not have to match with labels in ADT constructors.

Finally, source location fields are propagated as keyword fields on constructor ASTs. To access them, the user is required to explicitly declare a keyword field on all ADTs used in implosion. In other words, for every ADT type T, add:

data T(loc location=|unknown:///|);

Examples

Here are some examples for the above rules.

  • Example for rule 5. Given the grammar
syntax IDTYPE = Id ":" Type;
syntax Decls = decls: "declare" {IDTYPE ","}* ";";

Decls will be imploded as:

data Decls = decls(list[tuple[str,Type]]);

(assuming Id is a lexical non-terminal).

  • Example for rule 6. Given the grammar
syntax Formal = formal: "VAR"? {Id ","}+ ":" Type;

The corresponding ADT could be:

data Formal = formal(bool, list[str], Type);
  • Example for rule 8. Given the grammar
syntax Tag = "[" {Modifier ","}* "]";
syntax Decl = decl: Tag? Signature Body;

In this case, a Decl is imploded into the following ADT:

data Decl = decl(list[Modifier], Signature, Body);  
  • Example for rule 9. Given the grammar
syntax Exp = left add: Exp "+" Exp;

Can be imploded into:

data Exp = add(Exp, Exp);

data TreeSearchResult

Tree search result type for Tree At.

data TreeSearchResult[&T<:Tree]  
= treeFound(&T tree)
| treeNotFound()
;

function treeAt

Select the innermost Tree of a given type which is enclosed by a given location.

TreeSearchResult[&T<:Tree] treeAt(type[&T<:Tree] t, loc l, Tree a:appl(_, _))

default TreeSearchResult[&T<:Tree] treeAt(type[&T<:Tree] t, loc l, Tree root)

function sameType

bool sameType(label(_,Symbol s),Symbol t)

bool sameType(Symbol s,label(_,Symbol t))

bool sameType(Symbol s,conditional(Symbol t,_))

bool sameType(conditional(Symbol s,_), Symbol t)

bool sameType(Symbol s, s)

default bool sameType(Symbol s, Symbol t)

function isNonTerminalType

Determine if the given type is a non-terminal type.

bool isNonTerminalType(Symbol::\sort(str _))

bool isNonTerminalType(Symbol::\lex(str _))

bool isNonTerminalType(Symbol::\layouts(str _))

bool isNonTerminalType(Symbol::\keywords(str _))

bool isNonTerminalType(Symbol::\parameterized-sort(str _, list[Symbol] _))

bool isNonTerminalType(Symbol::\parameterized-lex(str _, list[Symbol] _))

bool isNonTerminalType(Symbol::\start(Symbol s))

default bool isNonTerminalType(Symbol s)