Document Object Model (DOM) Tutorial
This tutorial covers the basics of XML DOMs. Before reading this
tutorial you should already be familiar with XML and DTDs. You may want to read
my XML and DTD tutorials. Click the above links to do so.
DOM stands for Document
Object Model.
The DOM is an interface that exposes an XML document as a tree structure
comprised of nodes. The DOM allows you to programmatically navigate the tree
and add, change and delete any of its elements.
The DOM programming interface standards are defined by the
World Wide Web Consortium (W3C). The W3C site provides a comprehensive
reference of the XML DOM.
However, discussions contained herein focus on Microsoft's implementation of XML
and XML DOMs. Thus, all samples require their Internet Explorer, version 5.0 or
later, browser which includes their Msxml parser. All references to the Msxml
parser, either in text or in sample code, assume Msxml V2.5 or later. For more
information or to download Microsoft's XML products,
visit their site.
To manipulate an XML document you first load it into your computer's memory
using an XML parser. As stated above, the parser discussed here it the Msxml
parser from Microsoft. Once the XML document is loaded, its data can be
manipulated using a DOM.
A DOM treats the XML document as a tree. The DocumentElement
is the top or root of the tree. This root element can have one or more
child nodes which represent the branches of the tree.
The four main objects exposed by a DOM are the DOMDocument,
XMLDOMNode, XMLDOMNodeList and XMLDOMNamedNodeMap
which are all discussed below in subsequent sections.
For most XML documents, the most common types of nodes are element, attribute,
and text. Attributes differ from the other node types because they are not
considered child nodes of a parent. A separate programming interface, the XMLDOMNamedNodeMap,
is used for attributes.
Creating a DOM
Using the XML DOM begins when you create a DOMDocument object. You can then
load, parse, navigate, and manipulate XML files. The VB code to create a
DOMDocument is:
Dim xmlDoc = New DOMDocument
|
Loading and Saving Data
Use the Load or LoadXML
methods to load an XML file into the DOM. The load method uses a path or url to
an XML file. The loadXML method loads a string containing the XML data. After
loading, XMLDoc, contains a tree consisting of the parsed contents of
reports.xml.
xmlLDoc.load("http://xmlfiles/reports.xml")
xmlDoc.load("c:\temp\reports.xml")
xmlDoc.loadXML("<customer><first_name>Joe</first_name>
<last_name>Smith</last_name></customer>")
|
To save a parsed XML document to a file use the Save
method. Save can take a file name as a string.
xmlDoc.save("c:\temp\reports.xml")
|
Load and Parse Flags
File loading and parsing is done asynchronously by default. This means your app
is free to do other work while the file is being loaded. Also by default, any
well formed XML document can be loaded.
You can change this behavior by setting a few properties. Here, the XML is
loaded synchronously, and validated against a DTD.
Also, any external references in the DTD are resloved.
xmlDoc.async = False
xmlDoc.validateOnParse = True
xmlDoc.resolveExternals = True
xmlDoc.load("reports.xml")
|
Accessing Document and Error Information
You can retrieve the DTD used by the XML document, the path or URL of the file
that was loaded and a string containing the entire contents of the XML
document.
You can also get detailed information on errors that occurred during parsing.
Dim mydoctype as IXMLDOMDocumentType
xmlDoc.load("reports.xml")
Set MyDocType = xmlDoc.doctype
MsgBox (mydoctype.name) 'Display the DTD
used.
MsgBox
(xmlDoc.url) 'Path
or url of the XML file.
If xmlDoc.parseError.errorCode <> 0 Then
MsgBox ("A parse error occurred.")
Else
MsgBox
xmlDoc.documentElement.xml 'Display the actual XML data.
End If
|
Here is some more error information that is available:
Error Property
|
Description
|
errorCode
|
Error code of the last parse error
|
filepos
|
absolute file position of the error
|
line
|
line the error occurred on
|
linepos
|
position with in the line
|
reason
|
error description
|
srcText
|
line of XML that contains the error
|
Accessing the DOM Tree
You can access the tree starting at the root and walking down the tree or by
querying for a specific node. You navigate to the root element using the
documentElement property which returns the root element as an
XMLDOMNode object.
Dim xmlDoc As New DOMDocument
Dim root As IXMLDOMElement
Dim child As IXMLDOMNode
xmlDoc.load("reports.xml")
'Set root to the root element collection.
Set root = xmlDoc.documentElement
'Walk from the root to each of its child nodes.
For Each child In root.childNodes
MsgBox child.text
Next
|
To navigate to a specific node in the tree use the getElementsByTagName
method. This method takes a string containing a specific tag name and returns
all element nodes with this tag name.
Dim ElemList As IXMLDOMNodeList
Dim xmlDoc As New DomDocument
xmlDoc.load("reports.xml")
Set ElemList = xmlDoc.getElementsByTagName("AUTHOR")
For i=0 To (ElemList.length -1)
MsgBox ElemList.item(i).xml
Next
|
Creating Nodes
The DOMDocument object provides a generic createNode
method that lets you create nodes by supplying a node type, name, and
namespaceURI. I say generic because it also provides individual methods
to create most of the following specific node types.
Node Type
|
Value
|
Description
|
Node_Element
|
1
|
Node is an Element
|
Node_Attribute
|
2
|
Node is an Attritute of an element
|
Node_Text
|
3
|
Node represents the text content of a tag
|
Node_Cdata_Section
|
4
|
A CDATA section in the XML source. CDATA sections escape text that
would otherwise be interpreted as markup.
|
Node_Entity_Reference
|
5
|
A reference to an entity in the XML document
|
Node_Entity
|
6
|
Node represents an expanded entity
|
Node_Processing_Instruction
|
7
|
A processing instruction from the XML document
|
Node_Comment
|
8
|
Node represents a comment in the XML document
|
Node_Document
|
9
|
Represents a document object, which, as the root of the document
tree, provides access to the entire XML document
|
Node_Document_Type
|
10
|
Represents the document type declaration (DTD, indicated by the tag
|
Node_Document_Fragment
|
11
|
A document fragment node associates a node or subtree with a
document without actually being contained within the document
|
Node_Notation
|
12
|
Represents a notation in the document type declaration (DTD)
|
Here's an example of creating an attribute node:
Dim xmlDoc As New DomDocument
Dim MyNode As IXMLDOMNode
xmlDoc.load("C:\books.xml")
Set MyNode = xmlDoc.createNode(2, "XML", "")
|
The XMLDOMNode object is the main object within a
DOM. The DOMDocument object is itself an XMLDOMNode. So are the members
of node lists and named node maps which are discussed later.
Accessing Node Information
The XMLDOMNode object has several properties which provide info about a node.
Here are the simpler ones:
Node Property
|
Description
|
hasChildNodes
|
True if this node has children
|
namespaceURI
|
Returns the URI (universal resource identifier) for the
namespace (the "uuu" portion of the namespace declaration xmlns:nnn="uuu").
|
parsed
|
True if the node and all descendants have been parsed
and instantiated.
During asynchronous access, not all of the document tree may be available.
Before performing XSL transformations or pattern-matching operations, it is
useful to know if the entire tree below this node is available for processing.
|
xml
|
Returns a string containing the XML representation of
the node and all its descendants.
|
nodename
|
Returns the qualified name for the element, attribute,
or entity reference. Ex: returns xxx:yyy for the element <xxx:yyy>. The
return value depends on the nodetype.
|
nodetype
|
Returns an integer representing the XML DOM
node type.
|
nodetypestring
|
Returns a string representing the XML DOM
node type.
|
specified
|
Returns True if the attribute is explicitly specified
in the element. Returns False if the attribute value comes from the DTD or
schema. Returns True on non-attribute nodes.
|
This example illustrates a few of the above properties. It checks if the root
node has children and prints the number of child nodes.
Dim xmlDoc As New DOMDocument
Dim currNode As IXMLDOMNode
Dim strXML As String
xmlDoc.async = False
xmlDoc.load("c:\books.xml")
Set currNode = xmlDoc.documentElement.firstChild
strXML = currNode.xml
MsgBox currNode.namespaceURI
If currNode.parsed Then
MsgBox ("node was parsed")
End If
If currNode.hasChildNodes Then
MsgBox currNode.childNodes.length Else
MsgBox ("no child nodes")
End If
|
Setting Node Information
The data in an XML file is exposed in the DOM as node values.
Node values might be the value of an attribute or the text within an XML
element.
The nodeValue property provides access to values of attributes,
text nodes, comments, processing instructions, and CDATA section
nodes.
To get the value of an element type node, you can navigate to its element's
children (the text nodes within) and call nodeValue on them or use the
text property.
This code sets the value of an attribute and an element.
newAttNode = xmlDoc.createAttribute("newAtt")
newAttNode.nodeValue = "hello world"
If (elem1.text = "hello world") Then
elem1.text = "hi! world"
End If
|
Navigating Through Nodes
From the XMLDOMNode object, you can navigate to its: parent node using the
(parentNode) method, children (childNodes,
firstChild, lastChild), siblings (previousSibling,
nextSibling), or the document object the node belongs
(ownerDocument) to.
If the node type is element, attribute, or entityReference,
you can call the definition property to navigate
to the schema definition of the node.
If the node type is element, processingInstruction,
documentType, entity, or notation, you can navigate to the
attributes on the node using the attributes property.
These methods return the indicated node or null if the node doesn't
exists.
This example illustrates how to navigate the DOM tree.
Dim xmlDoc As New DOMDocument
Dim currNode As IXMLDOMNode
Dim newNode As IXMLDOMNode
Dim rootNode As IXMLDOMNode
Dim oNodeList As IXMLDOMNodeList
xmlDoc.async = False
xmlDoc.load("c:\books.xml")
Set rootNode = xmlDoc.documentElement
'
' Create a new node from another node's parent and display its XML.
'
Set currNode = xmlDoc.documentElement.childNodes.item(1).childNodes.item(0)
Set newNode = currNode.parentNode
MsgBox newNode.xml
'
' Display the XML for the root node's first child.
'
Set currNode = xmlDoc.documentElement.firstChild
MsgBox currNode.xml
'
' Create a new element and insert it before the last child of the top-level
node.
'
Set newNode = xmlDoc.createNode (1, "VIDEOS", "")
Set currNode = rootNode.insertBefore(newNode, rootNode.lastChild)
'
' Get a list of the root's children and display the XML for each child.
'
Set oNodeList = rootNode.childNodes
For Each currNode in oNodeList
MsgBox currNode.xml
Next
'
' Get a node, get its left sibling, display its XML.
'
Set currNode = xmlDoc.documentElement.childNodes.item(1)
Set newNode = currNode.previousSibling
MsgBox newNode.xml
|
You can also navigate to other nodes in the tree
using the selectNodes and selectSingleNode
methods. These methods take an XSL Pattern as an argument and return the
node or nodes that match that query. For more information about XSL Patterns,
see my XSL Tutorial.
Manipulating the Children of a Node
There are four methods that let you manipulate the children of a node. Each one
takes a node object as an argument. They are: appendChild,
replaceChild, removeChild and insertBefore.
Dim xmlDoc As New DOMDocument
Dim refNode As IXMLDOMNode
Dim newNode As IXMLDOMNode
Dim root As IXMLDOMNode
xmlDoc.async = False
xmlDoc.load("c:\books.xml")
Set root = xmlDoc.documentElement
'
' Create a new "pages" node. Insert it before the root's first child. Display
its XML.
'
Set newNode = xmlDoc.createElement("PAGES")
Set refNode = rootNode.childNodes.item(1).firstChild
root.childNodes.item(1).insertBefore newNode, refNode
MsgBox root.childNodes.item(1).xml
'
' Remove a child node.
'
Set refNode = root.childNodes.item(1).firstChild
root.childNodes.item(1).removeChild refNode
MsgBox root.childNodes.item(1).xml
'
' Replace the specified child with the new "pages" node.
'
Set newNode = xmlDoc.createElement("PAGES")
root.childNodes.item(1).replaceChild newNode,
root.childNodes.item(1).childNodes.item(0)
|
The XMLDOMNodeList object is a collection of
nodes. It is returned by the childNodes,
selectNodes and getElementsByTagName methods.
You can iterate sequentially through the nodes in the list as shown in
this previous example or by using the
nextNode method shown below. The length
property indicates the number of nodes in the list.
Dim xmlDoc As New DOMDocument
Dim currNode As IXMLDOMNode
Dim oNodeList As IXMLDOMNodeList
xmlDoc.async = False
xmlDoc.load("c:\books.xml")
'
' Get a list of the nodes and display their text.
'
Set oNodeList = xmlDoc.getElementsByTagName("AUTHOR")
For i = 0 TO (oNodeList.length -1)
Set currNode = oNodeList.nextNode
MsgBox currNode.text
Next
|
To access nodes randomly, use the item property.
This allows you to navigate directly to a specific node. The first node has an
index of zero.
Dim xmlDoc As New DOMDocument
Dim oNodeList As IXMLDOMNodeList
xmlDoc.async = False
xmlDoc.load("c:\books.xml")
'
' Get a list of the nodes and display their text.
'
Set oNodeList = xmlDoc.getElementsByTagName("AUTHOR")
For i = 0 TO (oNodeList.length -1)
MsgBox oNodeList(i).text
Next
|
An XMLDOMNamedNodeMap object is returned by the
attributes property. The XMLDOMNamedNodeMap object differs from the
node list because it is a collection of nodes that can also be accessed by
name.
Just like a node list, a named node map has a length property and can be
accessed using its item method. It also exposes the nextNode property
However, you can also access the members of a named node map name by using
getNamedItem and getQualifiedItem. The
getNamedItem method takes the name of the desired node as a parameter; the
getQualifiedItem method takes the name and namespaceURI of the desired node.
Each method returns an node object.
This code gets the value of the ID attribute on the elem1 element and assigns
that value to the variable "idValue".
idValue = elem1.attributes.getNamedItem("ID").nodeValue
|
Manipulating a Named Node Map
These methods allow you to manipulate named node maps: setNamedItem,
removeNamedItem and removeQualifiedItem.
The setNamedItem method takes an XML node object as a parameter, adding that
node to the named node map. If an attribute already exists with the same name,
the old attribute is replaced. This example creates a new attribute node with
the name "ID" and adds it to the attributes of elem1:
idAtt = XMLDoc.createAttribute("ID") elem1.setNamedItem(idAtt)
|
The removeNamedItem method takes a node name as a parameter, removing the node
with that name. The removeQualifiedItem method takes a node name and
namespaceURI as its parameters, removing the corresponding attribute.
|