Skip to main content

module lang::xml::DOM

rascal-0.40.16

Functions for reading and writing XML files to and from a "DOM" representation.

Usage

import lang::xml::DOM;

Dependencies

import Node;

Description

XML is a widely used markup language for encoding and exchanging documents.

The Document Object Model DOM is a cross-platform and language-independent way of representing and manipulating HTML, XHTML and XML documents. In this module we represent the DOM as a Rascal data types using keyword parameters for the optional attributes.

In IO a different approach is taken, where each XML document is mapped to an instance of the node class, which gives a more direct one-to-ony mapping as opposed to the DOM encoding here. If you are studying XML documents in general, then the current module is the place to be. If you are reading in specific data which is only accidentally encoded as XML, then have a look at IO.

The following functions are provided:

data Node

Datatypes for representing an instance of the DOM.

data Node (map[str key, str val] attrs = ()) 
= document(Node root)
| attribute(Namespace namespace, str name, str text)
| element(Namespace namespace, str name, list[Node] children)
| charData(str text)
| cdata(str text)
| comment(str text)
| pi(str target, str text)
| entityRef(str name)
| charRef(int code)
;

data Namespace

data Namespace  
= namespace(str prefix, str uri)
| none()
;

function implode

value implode(document(Node root))

value implode(element(Namespace _, str name, list[Node] kids))

value implode(charData(str t))

value implode(cdata(str t))

default value implode(Node x)

function toXML

Node toXML(node x)

default Node toXML(value x)

function attribute

Auxiliary constructor for XML attribute without namespace.

Node attribute(str name, str text)

function element

Auxiliary constructor for XML element without namespace.

Node element(str name, list[Node] kids)

function parseXMLDOM

Parse an XML document and return a DOM instance.

Node parseXMLDOM(str src)

Examples

Read the sample note file, parse it, and construct a DOM instance.

rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>N = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>parseXMLDOM(N);
Node: document(element(
none(),
"note",
[
charData("\n"),
element(
none(),
"to",
[charData("Jurgen")]),
charData("\n"),
element(
none(),
"to",
[charData("Tijs")]),
charData("\n"),
element(
none(),
"from",
[charData("Paul")]),
charData("\n"),
element(
none(),
"date",
[charData("2012-04-01")]),
charData("\n"),
element(
none(),
"heading",
[
attribute(
none(),
"font",
"bold"),
charData("Reminder")
]),
charData("\n"),
element(
none(),
"body",
[charData("Don\'t forget to run the Rascal tests!")]),
charData("\n")
]))

The DOM instance contains every single character (including spaces and newlines) as they appear in the source file. As expected, the result is of type Node.

function parseXMLDOMTrim

Parse an XML document and trim it (remove layout).

Node parseXMLDOMTrim(str src)

Examples

Read the sample note file, parse it, and construct a DOM instance (using parseXMLDOMTrim).

rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>N = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>parseXMLDOMTrim(N);
Node: document(element(
none(),
"note",
[
element(
none(),
"to",
[charData("Jurgen")]),
element(
none(),
"to",
[charData("Tijs")]),
element(
none(),
"from",
[charData("Paul")]),
element(
none(),
"date",
[charData("2012-04-01")]),
element(
none(),
"heading",
[
attribute(
none(),
"font",
"bold"),
charData("Reminder")
]),
element(
none(),
"body",
[charData("Don\'t forget to run the Rascal tests!")])
]))

All whitespace characters have been removed and do not occur in the trimmed DOM instance. Compare this with the output of Parse X M L D OM.

function xmlRaw

Convert a DOM instance to a raw XML string.

str xmlRaw(Node x)

Examples

Read the sample note file, parse it, construct a DOM instance, and convert it to a string:

rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>F = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(F);
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
rascal>S = xmlRaw(parseXMLDOM(F));
str: "\<?xml version=\"1.0\" encoding=\"UTF-8\"?\>\r\n\<note\>\r\n\<to\>Jurgen\</to\>\r\n\<to\>Tijs\</to\>\r\n\<from\>Paul\</from\>\r\n\<date\>2012-04-01\</date\>\r\n\<heading font=\"bold\"\>Reminder\</heading\>\r\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\r\n\</note\>\r\n"
---
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>

---
rascal>println(S);
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok

Apart from an extra XML header, the original source file F and the output S of xmlRaw are identical.

function xmlCompact

Convert a DOM instance to a compact XML string (with minimal white space).

str xmlCompact(Node x)

Examples

Read the sample note file, parse it, construct a DOM instance, and convert it to a string:

rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>F = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(F);
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
rascal>S = xmlCompact(parseXMLDOM(F));
str: "\<?xml version=\"1.0\" encoding=\"UTF-8\"?\>\r\n\<note\>\<to\>Jurgen\</to\>\<to\>Tijs\</to\>\<from\>Paul\</from\>\<date\>2012-04-01\</date\>\<heading font=\"bold\"\>Reminder\</heading\>\<body\>Don\'t forget to run the Rascal tests!\</body\>\</note\>\r\n"
---
<?xml version="1.0" encoding="UTF-8"?>
<note><to>Jurgen</to><to>Tijs</to><from>Paul</from><date>2012-04-01</date><heading font="bold">Reminder</heading><body>Don't forget to run the Rascal tests!</body></note>

---
rascal>println(S);
<?xml version="1.0" encoding="UTF-8"?>
<note><to>Jurgen</to><to>Tijs</to><from>Paul</from><date>2012-04-01</date><heading font="bold">Reminder</heading><body>Don't forget to run the Rascal tests!</body></note>
ok

The output S of xmlCompact is a version of the original source file F with all white space removed.

function xmlPretty

Convert a DOM instance to a pretty printed XML string.

str xmlPretty(Node x)

Examples

Read the sample note file, parse it, construct a DOM instance, and convert it to a string:

rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>F = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(F);
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
rascal>S = xmlPretty(parseXMLDOM(F));
str: "\<?xml version=\"1.0\" encoding=\"UTF-8\"?\>\r\n\<note\>\r\n \<to\>Jurgen\</to\>\r\n \<to\>Tijs\</to\>\r\n \<from\>Paul\</from\>\r\n \<date\>2012-04-01\</date\>\r\n \<heading font=\"bold\"\>Reminder\</heading\>\r\n \<body\>Don\'t forget to run the Rascal tests!\</body\>\r\n\</note\>\r\n"
---
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>

---
rascal>println(S);
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok

The output S of xmlPretty is a pretty printed version of the original source file F. Observe that the elements inside <note> ... </note> are indented.