module lang::xml::DOM
Functions for reading and writing XML files to and from a "DOM" representation.
Usage
import lang::xml::DOM;
Dependencies
import Node;
Description
XML is a widely used markup language for encoding and exchanging documents.
The Document Object Model DOM is a cross-platform and language-independent way of representing and manipulating HTML, XHTML and XML documents. In this module we represent the DOM as a Rascal data types using keyword parameters for the optional attributes.
In IO a different approach is taken, where each XML document is mapped to an instance of
the node
class, which gives a more direct one-to-ony mapping as opposed to the DOM encoding here.
If you are studying XML documents in general, then the current module is the place to be. If you
are reading in specific data which is only accidentally encoded as XML, then have a look at IO.
The following functions are provided:
- attribute
- cdata
- charData
- charRef
- comment
- document
- element
- entityRef
- namespace
- none
- pi
- Namespace
- Node
- attribute
- element
- implode
- parseXMLDOM
- parseXMLDOMTrim
- toXML
- xmlCompact
- xmlPretty
- xmlRaw
data Node
Datatypes for representing an instance of the DOM.
data Node (map[str key, str val] attrs = ())
= document(Node root)
| attribute(Namespace namespace, str name, str text)
| element(Namespace namespace, str name, list[Node] children)
| charData(str text)
| cdata(str text)
| comment(str text)
| pi(str target, str text)
| entityRef(str name)
| charRef(int code)
;
data Namespace
data Namespace
= namespace(str prefix, str uri)
| none()
;
function implode
value implode(document(Node root))
value implode(element(Namespace _, str name, list[Node] kids))
value implode(charData(str t))
value implode(cdata(str t))
default value implode(Node x)
function toXML
Node toXML(node x)
default Node toXML(value x)
function attribute
Auxiliary constructor for XML attribute without namespace.
Node attribute(str name, str text)
function element
Auxiliary constructor for XML element without namespace.
Node element(str name, list[Node] kids)
function parseXMLDOM
Parse an XML document and return a DOM instance.
Node parseXMLDOM(str src)
Examples
Read the sample note file, parse it, and construct a DOM instance.
rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>N = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>parseXMLDOM(N);
Node: document(element(
none(),
"note",
[
charData("\n"),
element(
none(),
"to",
[charData("Jurgen")]),
charData("\n"),
element(
none(),
"to",
[charData("Tijs")]),
charData("\n"),
element(
none(),
"from",
[charData("Paul")]),
charData("\n"),
element(
none(),
"date",
[charData("2012-04-01")]),
charData("\n"),
element(
none(),
"heading",
[
attribute(
none(),
"font",
"bold"),
charData("Reminder")
]),
charData("\n"),
element(
none(),
"body",
[charData("Don\'t forget to run the Rascal tests!")]),
charData("\n")
]))
The DOM instance contains every single character (including spaces and newlines) as they appear in the source file. As expected, the result is of type Node.
function parseXMLDOMTrim
Parse an XML document and trim it (remove layout).
Node parseXMLDOMTrim(str src)
Examples
Read the sample note file, parse it, and construct a DOM instance (using parseXMLDOMTrim
).
rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>N = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>parseXMLDOMTrim(N);
Node: document(element(
none(),
"note",
[
element(
none(),
"to",
[charData("Jurgen")]),
element(
none(),
"to",
[charData("Tijs")]),
element(
none(),
"from",
[charData("Paul")]),
element(
none(),
"date",
[charData("2012-04-01")]),
element(
none(),
"heading",
[
attribute(
none(),
"font",
"bold"),
charData("Reminder")
]),
element(
none(),
"body",
[charData("Don\'t forget to run the Rascal tests!")])
]))
All whitespace characters have been removed and do not occur in the trimmed DOM instance. Compare this with the output of Parse X M L D OM.
function xmlRaw
Convert a DOM instance to a raw XML string.
str xmlRaw(Node x)
Examples
Read the sample note file, parse it, construct a DOM instance, and convert it to a string:
rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>F = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(F);
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
rascal>S = xmlRaw(parseXMLDOM(F));
str: "\<?xml version=\"1.0\" encoding=\"UTF-8\"?\>\r\n\<note\>\r\n\<to\>Jurgen\</to\>\r\n\<to\>Tijs\</to\>\r\n\<from\>Paul\</from\>\r\n\<date\>2012-04-01\</date\>\r\n\<heading font=\"bold\"\>Reminder\</heading\>\r\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\r\n\</note\>\r\n"
---
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(S);
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
Apart from an extra XML header, the original source file F
and the output S
of xmlRaw
are identical.
function xmlCompact
Convert a DOM instance to a compact XML string (with minimal white space).
str xmlCompact(Node x)
Examples
Read the sample note file, parse it, construct a DOM instance, and convert it to a string:
rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>F = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(F);
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
rascal>S = xmlCompact(parseXMLDOM(F));
str: "\<?xml version=\"1.0\" encoding=\"UTF-8\"?\>\r\n\<note\>\<to\>Jurgen\</to\>\<to\>Tijs\</to\>\<from\>Paul\</from\>\<date\>2012-04-01\</date\>\<heading font=\"bold\"\>Reminder\</heading\>\<body\>Don\'t forget to run the Rascal tests!\</body\>\</note\>\r\n"
---
<?xml version="1.0" encoding="UTF-8"?>
<note><to>Jurgen</to><to>Tijs</to><from>Paul</from><date>2012-04-01</date><heading font="bold">Reminder</heading><body>Don't forget to run the Rascal tests!</body></note>
---
rascal>println(S);
<?xml version="1.0" encoding="UTF-8"?>
<note><to>Jurgen</to><to>Tijs</to><from>Paul</from><date>2012-04-01</date><heading font="bold">Reminder</heading><body>Don't forget to run the Rascal tests!</body></note>
ok
The output S
of xmlCompact
is a version of the original source file F
with all white space removed.
function xmlPretty
Convert a DOM instance to a pretty printed XML string.
str xmlPretty(Node x)
Examples
Read the sample note file, parse it, construct a DOM instance, and convert it to a string:
rascal>import IO;
ok
rascal>import lang::xml::DOM;
ok
rascal>F = readFile(|lib://rascal/org/rascalmpl/library/lang/xml/examples/note.xml|);
str: "\<note\>\n\<to\>Jurgen\</to\>\n\<to\>Tijs\</to\>\n\<from\>Paul\</from\>\n\<date\>2012-04-01\</date\>\n\<heading font=\"bold\"\>Reminder\</heading\>\n\<body\>Don\'t forget to run the Rascal tests!\</body\>\n\</note\>"
---
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(F);
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
rascal>S = xmlPretty(parseXMLDOM(F));
str: "\<?xml version=\"1.0\" encoding=\"UTF-8\"?\>\r\n\<note\>\r\n \<to\>Jurgen\</to\>\r\n \<to\>Tijs\</to\>\r\n \<from\>Paul\</from\>\r\n \<date\>2012-04-01\</date\>\r\n \<heading font=\"bold\"\>Reminder\</heading\>\r\n \<body\>Don\'t forget to run the Rascal tests!\</body\>\r\n\</note\>\r\n"
---
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
---
rascal>println(S);
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Jurgen</to>
<to>Tijs</to>
<from>Paul</from>
<date>2012-04-01</date>
<heading font="bold">Reminder</heading>
<body>Don't forget to run the Rascal tests!</body>
</note>
ok
The output S
of xmlPretty
is a pretty printed version of the original source file F
.
Observe that the elements inside <note> ... </note>
are indented.