module lang::box::util::Tree2Box
The default formatting rules for any parsetree.
Usage
import lang::box::util::Tree2Box;
Dependencies
import ParseTree;
import lang::box::\syntax::Box;
import String;
Description
This module is meant to be extended to include rules specific for a language.
The main goal of this module is to minimize the number of necessary specializations for any specific programming language.
This module is a port of the original default formatting rules, implemented in C + ATerm library + APIgen, of the "Pandora" in The ASF+SDF Meta-Environment, as described in
M.G.J. van den Brand, A.T. Kooiker, Jurgen J. Vinju, and N.P. Veerman. A Language Independent Framework for Context-sensitive Formatting. In CSMR '06: Proceedings of the Conference on Software Maintenance and Reengineering, pages 103-112, Washington, DC, USA, 2006. IEEE Computer Society Press.
However, due to the more powerful pattern matching available in Rascal, than in C with the ATerm library, we can specialize for more cases more easily than in the original paper. For example, single and multi-line comment styles are automatically recognized.
The current algorithm, not extended, additionally guarantees that no comments are lost as long as their grammar
rules have been tagged with @category="Comment"
Another new feature is the normalization of case-insensitive literals. By providing To Upper or To Lower the mapping algorithm will change every instance of a case-insensitive literal accordingly before translating it to an L box expression. In case of As Is, the literal will be printed as it occurred in the source code.
Examples
rascal>import lang::box::\syntax::Box;
ok
rascal>extend lang::box::util::Tree2Box;
ok
Notice how we used extend
and not import
, which will be important in the following.
rascal>import lang::pico::\syntax::Main;
ok
First, let's get an example program text
rascal>example = "begin
>>>>>>> '%% this is an example Pico program
>>>>>>> ' declare
>>>>>>> ' a : %inline comment% natural,
>>>>>>> ' b : natural;
>>>>>>> ' a := a + b;
>>>>>>> ' b := a - b;
>>>>>>> ' a := a - b
>>>>>>> 'end";
str: "begin\n%% this is an example Pico program\n declare\n a : %inline comment% natural,\n b : natural;\n a := a + b;\n b := a - b;\n a := a - b\nend"
---
begin
%% this is an example Pico program
declare
a : %inline comment% natural,
b : natural;
a := a + b;
b := a - b;
a := a - b
end
---
Now we parse it:
rascal>program = [start[Program]] example;
value: appl(
prod(
start(sort("Program")),
[
layouts("Layout"),
label(
"top",
sort("Program")),
layouts("Layout")
],
{}),
[appl(
prod(
layouts("Layout"),
[conditional(
\iter-star(lex("WhitespaceAndComment")),
{\not-follow(\char-class([
range(9,10),
range(13,13),
range(32,32),
range(37,37)
]))})],
{}),
[appl(
regular(\iter-star(lex("WhitespaceAndComment"))),
[],
src=|prompt:///|(0,0,<1,0>,<1,0>))],
src=|prompt:///|(0,0,<1,0>,<1,0>)),appl(
prod(
label(
"program",
sort("Program")),
[
lit("begin"),
layouts("Layout"),
label(
"decls",
sort("Declarations")),
layouts("Layout"),
label(
"body",
\iter-star-seps(
sort("Statement"),
[
layouts("Layout"),
lit(";"),
layouts("Layout")
])),
layouts("Layout"),
lit("end")
],
{}),
[appl(
prod(
lit("begin"),
[
\char-class([range(98,98)]),
\char-class([range(101,101)]),
\char-class([range(103,103)]),
\char-class([range(105,105)]),
\char-class([range(110,110)])
],
{}),
[char(98),char(101),char(103),char(105),char(110)]),appl(
prod(
layouts("Layout"),
[conditional(
\iter-star(lex("WhitespaceAndComment")),
{\not-follow(\char-class([
range(9,10),
range(13,13),
range(32,32),
range(37,37)
]))})],
{}),
[appl(
regular(\iter-star(lex("WhitespaceAndComment"))),
[appl(
prod(
lex("WhitespaceAndComment"),
[\char-class([
range(9,10),
range(13,13),
range(32,32)
])],
{}),
[char(10)],
src=|prompt:///|(5,1,<1,5>,<2,0>)),appl(
prod(
lex("WhitespaceAndComment"),
[
lit("%%"),
conditional(
\iter-star(\char-class([
range(1,9),
range(11,1114111)
])),
{\end-of-line()})
],
{tag("category"("Comment"))}),
[appl(
prod(
lit("%%"),
[
\char-class([range(37,37)]),
\char-class([range(37,37)])
],
{}),
[char(37),char(37)]),appl(
regular(\iter-star(\char-class([
range(1,9),
range(11,1114111)
]))),
[char(32),char(116),char(104),char(105),char(115),char(32),char(105),char(115),char(32),char(97),char(110),char(32),char(101),char(120),char(97),char(109),char(112),char(108),char(101),char(32),char(80),char(105),char(99),char(111),char(32),char(112),char(114),char(111),char(103),char(114),char(97),char(109)],
src=|prompt:///|(8,32,<2,2>,<2,34>))],
src=|prompt:///|(6,34,<2,0>,<2,34>)),appl(
prod(
lex("WhitespaceAndComment"),
[\char-class([
...
Then we can convert it to a Box tree:
rascal>b = toBox(program);
Box: HV([
U([]),
V([
L("begin"),
I([V([
U([V([H(
[
L("%%"),
H(
[
L("this"),
L("is"),
L("an"),
L("example"),
L("Pico"),
L("program")
],
hs=1)
],
hs=1)])]),
V([
L("declare"),
I([V([
U([]),
V(
[G(
[
HV([
L("a"),
U([]),
L(":"),
U([HV(
[
L("%"),
L("inline"),
L("comment"),
L("%")
],
hs=1)]),
HV([L("natural")])
]),
U([]),
L(","),
U([]),
HV([
L("b"),
U([]),
L(":"),
U([]),
HV([L("natural")])
])
],
op=function(|file:///home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/lang/box/syntax/Box.rsc|(2504,18,<42,6>,<42,24>)),
hs=0,
gs=4)],
hs=1),
U([])
])]),
L(";")
]),
U([]),
V(
[G(
[
HV([
L("a"),
U([]),
L(":="),
U([]),
U([
HV([L("a")]),
U([]),
L("+"),
U([]),
HV([L("b")])
])
]),
U([]),
L(";"),
U([]),
HV([
L("b"),
U([]),
L(":="),
U([]),
U([
HV([L("a")]),
U([]),
L("-"),
U([]),
HV([L("b")])
])
]),
U([]),
L(";"),
U([]),
HV([
L("a"),
U([]),
L(":="),
U([]),
U([
HV([L("a")]),
U([]),
...
Finally, we can format the box tree to get a prettier format:
rascal>import lang::box::util::Box2Text;
ok
rascal>format(b)
str: "begin\n %% this is an example Pico program\n declare\n a : % inline comment % natural,b : natural\n ;\n a := a + b;b := a - b;\n a := a - b\nend\n"
---
begin
%% this is an example Pico program
declare
a : % inline comment % natural,b : natural
;
a := a + b;b := a - b;
a := a - b
end
---
If you are not happy, then you should produce a specialization:
rascal>Box toBox((Program) `begin <Declarations decls> <{Statement ";"}* body> end`, FormatOptions opts=formatOptions())
>>>>>>> = V([
>>>>>>> L("begin"),
>>>>>>> I([
>>>>>>> toBox(decls)
>>>>>>> ], is=2),
>>>>>>> I([
>>>>>>> toBox(body)
>>>>>>> ], is=4),
>>>>>>> L("end")
>>>>>>> ]);
Box (Program, FormatOptions opts = ...): function(|prompt:///|(0,277,<1,0>,<11,7>))
and we see the result here:
rascal>format(toBox(program));
str: "begin\n declare\n a : % inline comment % natural,b : natural\n ;\n a := a + b;b := a - b;\n a := a - b\nend\n"
---
begin
declare
a : % inline comment % natural,b : natural
;
a := a + b;b := a - b;
a := a - b
end
---
data FormatOptions
Configuration options for toBox.
data FormatOptions
= formatOptions(
CaseInsensitivity ci = asIs()
)
;
data CaseInsensitivity
Normalization choices for case-insensitive literals.
data CaseInsensitivity
= toLower()
| toUpper()
| asIs()
;
function toBox
This is the generic default formatter.
default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo())
This generic formatter is to be overridden by someone constructig a formatter tools
for a specific language. The goal is that this toBox
default rule maps
syntax trees to plausible Box expressions, and that only a minimal amount of specialization
by the user is necessary.
function toBox
For ambiguity clusters an arbitrary choice is made.
default Box toBox(amb({Tree t, *Tree _}), FO opts=fo())
function toBox
When we end up here we simply render the unicode codepoint back.
default Box toBox(c:char(_), FormatOptions opts=fo() )
function toBox
Cycles are invisible and zero length.
default Box toBox(cycle(_, _), FO opts=fo())
alias FO
Private type alias for legibility's sake.
FormatOptions
function delabel
Removing production labels removes similar patterns in the main toBox function.
Production delabel(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs))
default Production delabel(Production p)
list[Symbol] delabel(list[Symbol] syms)
Symbol delabel(label(_, Symbol s))
default Symbol delabel(Symbol s)
function fo
This is a short-hand for legibility's sake.
FO fo()
function ci
Implements normalization of case-insensitive literals.
str ci(str word, toLower())
str ci(str word, toUpper())
str ci(str word, asIs())
function words
Split a text by the supported whitespace characters.
list[str] words(str text)