An Extensible Markup Language (XML) Tutorial
This tutorial covers the basics of XML and many of the common
features and terms associated with XML. After completing this tutorial you
should have a general understanding XML and how and why to use it.
Discussions contained herein focus on Microsoft's implementation of XML. Thus,
most samples require version 5.0 or later of their Internet Explorer browser.
To learn more about Microsoft's XML products,
visit their site.
XML stands for Extensible
Markup Language.
XML was designed to describe structured data. Its a markup language similar to
HyperText Markup
Language (HTML). Both
XML and HTML are subsets of Standard
Generalized Markup Language
(SGML).
Unlike HTML, XML tags are not predefined. You make up your own
unlimited set of tags. This is why it is extensible.
XML is a meta-markup language (i.e. it conveys information about itself) so it
is self-describing. Since you make up your own
tags, XML uses a Document Type Definition (DTD) to describe
its data to applications that use it.
XML was designed to describe data and focus on what the data is.
HTML was designed to display data and focus on how the data looks. XML data can
be viewed in a browser or it can be passed to other applications for processing
and viewing.
XML standards are defined by the
World Wide Web Consortium (W3C), ensuring that XML will be uniform and
independent of applications or vendors. The W3C site is the most complete
reference of XML available.
Data Separation
XML can keep data separated from your HTML. HTML pages are used to display
data. Data is often stored inside HTML pages. With XML this data can be stored
in a separate XML file. Thus, you can concentrate on using HTML for formatting
and display, and be sure that changes in the underlying data will not force
changes to any of your HTML code.
XML data can also be stored inside HTML pages as Data
Islands. You can still concentrate on using HTML for formatting and
displaying the data.
Communication
Different computer systems typically contain data in incompatible formats. This
makes exchanging data between such systems difficult. Converting the data to
XML greatly reduces this task since the data can be read by different types of
applications. XML can also be used to store data in files or databases.
XML is a set of rules for creating semantic tags used to
describe data. An XML element is made up of a
start and end tag with data in between.
The tags describe the data. The data is called the value
of the element. For example, this XML element is a <director> element
with the value "Bill Smith."
<director>Bill Smith</director>
The element's name is "director" and allows you to mark up the
value "Bill Smith" so you can differentiate it from another similar piece of
data. Consider another element with the value "Bill Smith".
<actor>Bill Smith</actor>
Since each element has a different tag name, you can see one
refers to Bill Smith, the director, while the other refers to Bill Smith, the
actor.
A basic XML document is simply an XML element that can - but
might not - include nested XML elements.
Here is an example of an XML document:
<?xml version="1.0"?>
<message>
<to>Dave</to>
<from>Susan</from>
<subject>Reminder</subject>
<text>Don't forget to buy milk on the way home.</text>
</message>
|
The first line of the document is the XML
declaration and should always be included. It defines the XML
version of the document. In this case the document conforms to the 1.0
specification of XML:
The next line defines the first or root element of
the document:
The following 4 lines define 4 child elements of
the:
<to>Dave</to>
<from>Susan</from>
<subject>Reminder</subject>
<text>Don't forget to buy milk on the way home.</text>
|
The last line defines the end of the root element:
XML documents must adhere to the following strict syntax rules.
-
XML elements must have a closing tag
Some HTML elements, such as the paragraph (<p>), don't need a closing
tag. However, all XML elements must have a closing tag.
-
Empty Elements
XML allows empty elements with this shorthand notation:
<title></title> Normal notation
<title/> Shorthand
notation
|
-
Tags must be properly nested
Overlapping elements are not allowed. An element must have a closing tag before
the next element's starting tag.
<b><i>This text is bold and italic</b></i> This
is incorrect
<b><i>This text is bold and italic</i></b> This
is correct
|
-
XML tags are case sensitive
The following specify different elements:
<City> <CITY> <city>
<City>This is incorrect</city>
<city>This is correct</city>
|
-
XML documents must have a root tag
All XML documents must contain a single, unique tag pair to define the root
element. All other elements must be nested within the root element. All
elements can have sub (child) elements. Sub
elements must be in pairs and correctly nested within their parent element.
<root>
<child>
<subchild>
</subchild>
</child>
</root>
|
-
Attribute values must always be quoted
An element can optionally contain one or more attributes
in its start tag. An attribute is a name-value pair separated by an equal sign
(=). Attribute values must always be quoted.
<CITY ZIP="01085">Westfield</CITY>
ZIP="01085" is an attribute of the <CITY> element.
Attributes are used to attach additional, secondary information to an element.
Attributes can also accept default values, while elements cannot. Each
attribute of an element can be specified only once, but in any order.
<message date="12/11/99"> This is correct
<message date=12/11/99> This is incorrect
<message ID="100"> The ID attribute can be used to identify which message
<message ID="101">
|
A Valid XML document is a Well
Formed XML document that adheres to the rules of a Document
Type Definition (DTD). A DTD defines the legal elements of an XML
document. DTDs can be inline in your XML document or externally referenced.
This XML document has a reference to an external DTD.
<?xml version="1.0"?>
<!DOCTYPE message SYSTEM "InternalMessage.dtd">
<message>
<to>Dave</to>
<from>Susan</from>
<subject>Reminder</subject>
<text>Don't forget to buy milk on the way home.</text>
</message>
|
Read more about document type definitions.
A data island is an XML document
that exists within an HTML page. It lets you script against the XML document
without having to load it through script or through the <OBJECT> tag.
Almost any well-formed XML document can be inside a data island. Data islands
can be inline or external.
The <XML> element marks the beginning of the data island.
Its ID attribute provides a way to reference the
data island. The SRC attribute is used to identify
the external XML file.
<XML ID="XMLID" SRC="customer.xml"></XML>
|
You can also use the <SCRIPT> tag to create a data island:
<SCRIPT LANGUAGE="xml" ID="XMLID">
<customer>
<name>Bill Smith</name>
<custID>12345</custID>
</customer>
</SCRIPT>
|
Here is a complete example of an inline data island bound to the HTML:
<HTML><Head></Head><Title></Title>
<Body>
<XML ID="XMLID">
<customers>
<customer>
<name>Bill Smith</name>
<custID>100</custID>
</customer>
<customer>
<name>John Doe</name>
<custID>101</custID>
</customer>
<customer>
<name>Lisa Longo</name>
<custID>102</custID>
</customer>
</customers>
</XML>
<table datasrc="#XMLID">
<tr>
<td><div datafld="name"></div></td>
<td><div datafld="custID"></div></td>
</tr>
</table>
</Body>
</HTML>
|
The <XML> tag's ID
attribute is used to reference the data island in the HTML. Using HTML tags
that can accept data source tags (i.e. bind the
HTML to the XML data), you can easily format and display the XML data. This
HTML page displays the XML data in a table.
The <table> tag uses the DATASRC
attribute to refer to the inline XML data island whose ID attribute is XMLID.
The <TD> element itself can't be bound to data but the <div> tag
can. The DATAFLD attribute indicates which XML
element to place in the cell of the table. As the XML is read, additional table
rows are created for each element tagged with the <customer> tag.
The functionality within Internet Explorer to bind XML data to
HTML is called the Data Source Object (DSO).
A namespace is a collection of
names used as element or attribute names in an XML document. A namespace
qualifies element names to make them unique on the Web to avoid conflicts
between elements with the same name.
A namespace is identified by a Universal
Resource Identifier (URI) which can be either a Uniform
Resource Locator (URL) or a Uniform Resource
Number (URN). It doesn't matter what the URI points to. URIs are
used because they are globally unique across the Internet.
Namespaces can be declared explicitly
or by default. With explicit declarations, you
define a prefix to qualify elements belonging to that namespace.
Here's an explicit declaration which defines the "bk" and
"money" namespace prefixes. The xmlns attribute is
an XML keyword for a namespace declaration. Elements starting with "bk:" or
"money:" are from the "urn:BookLovers.org:BookInfo" and "urn:Finance:Money,"
namespaces respectively.
Move the mouse over the XML for more information.
<BOOKS>
<bk:BOOK
xmlns:bk="urn:BookLovers.org:BookInfo"
xmlns:money="urn:Finance:Money">
<bk:title>A
Suitable Boy</bk:title>
<bk:PRICE
money:currency="US
Dollar">22.95</bk:PRICE>
</bk:BOOK>
</BOOKS>
|
Default declarations define a namespace to be used for all
elements within its scope. No prefix is used. A namespace declared without a
prefix becomes the default namespace for the document. All elements and
attributes in the document that don't have a prefix belong to the default
namespace.
<BOOK xmlns="urn:BookLovers.org:BookInfo">
<title>A Suitable Boy</title>
<PRICE currency="US Dollar">22.95</PRICE>
|
Viewing XML With Internet Explorer 5+
|
You can use IE5+ to view an XML document. To open an XML
document, click on a link to an XML file, type its URL in the address bar, or
double-click on an XML document in a folder.
When you display an XML document in Explorer, IE shows the
document with its root element and child elements expanded. Use the plus (+)
and minus sign (-) signs to the left of the XML elements to expand or collapse
the element structure.
Note: If you are not using IE5+, all bets are off.
Try these files:
XML's goal is to separate data from its presentation. So then,
how do you display the data in a neat format? You can use Cascading
Style Sheets (CSS) just as you would format HTML.
The CSS associates formatting properties with the XML tags
allowing the CSS to decorate the existing XML tree structure. Problem
is, forethought must be used when designing the XML tree structure so you can
display it properly. This violates the idea of separation of data.
The solution is to use Extensible Stylesheet
Language (XSL) instead. XSL lets you transform the XML tree
into a new tree without changing the XML source. Then the XML can be displayed
differently just by switching style sheets.
Read more about the extensible stylesheet
language.
We saw how to use data islands to include
XML data in your HTML page, how to display the data using cascading
style sheets and how to qualify XML data with namespaces.
Using all of these features you can embed HTML tags into your
XML data and format the XML data for display. Move the mouse over the XML for
more information.
<?xml version="1.0"?>
<?xml-stylesheet type="text/css"
href="AllTogether.css" ?>
<COURSE
xmlns:HTML="http://www.w3.org/TR/REC-html40">
<title>Putting it All
Together</title>
<HTML:UL>
<HTML:LI>Line
one</HTML:LI>
<HTML:LI>Line
two</HTML:LI>
</HTML:UL>
<HTML:BR />
<HTML:IMG src="MyImage.jpg" />
</COURSE>
|
The special HTML namespace used has
a predefined meaning in the browser. It instructs the browser to interpret any
content in the HTML namespace as HTML rather than XML and be rendered as such.
Click the AllTogether CSS link to view it and notice it uses
Media Styles. This allows you to specify one set of styles to be
applied to online content and a different set to be used when IE prints the
page.
|