An XML Schema Tutorial
This tutorial covers the basics of XML Schemas. Before reading
this tutorial you should already be familiar with XML and DTDs. You may want to
read my XML and DTD tutorials. Click the above links to do so.
Much like Data Type Definitions (DTDs), Schemas
define the elements that can appear in an XML document and the attributes that
can be associated with those elements.
Schemas define the document's structure - which elements are children of others,
the order the child elements can appear, and the number of child elements.
Schemas specify if an element is empty or if it can include text. They can also
specify default values for attributes.
Schemas are more powerful and flexible than DTDs and use XML syntax.
Independent developers can agree to use a common Schema for exchanging XML data.
Your application can use this agreed upon Schema to verify the data it
receives. Verifying an XML document against the schema is known as
validating the document.
Schema standards are defined by the
World Wide Web Consortium (W3C). The W3C site provides a comprehensive
reference of XML schemas.
However, discussions contained herein focus on Microsoft's implementation of
schemas. All samples require their Internet Explorer, version 5.0 or later,
browser which includes their Msxml parser. All references to the Msxml parser,
either in text or in sample code, assume Msxml V2.5 or later. For more
information or to download Microsoft's XML products,
visit their site.
To use a schema in an XML document, add a schema namespace
declaration:
<book xmlns="x-schema:yourschema.xml">
<title>Presenting XML</title>
<author>Richard Light</author>
</book>
|
You define elements and attributes in a Schema by specifying <ElementType>
and <AttributeType> tags. Instances of
elements or attributes are declared using the <element>
and <attribute> tags.
Move the cursor over the following text for more information.
<?xml version="1.0"?>
<Schema
xmlns="schemas-microsoft-com:xml-data">
<ElementType name="title" />
<ElementType name="author"
/>
<ElementType name="pages" />
<ElementType name="book"
model="closed">
<element
type="title" />
<element
type="author" />
<element
type="pages" />
<AttributeType
name="copyright" />
<attribute type="copyright"
/>
</ElementType>
</Schema>
|
Here, there are 4 <ElementType> elements: "title", "author",
"pages" and "book." These are definitions of the elements. The content
for a book is declared within the "book" ElementType. Each book contains
"title", "author" and "pages" elements using the <element> tag
with a type attribute that references the ElementType.
You can also define an <AttributeType> for the copyright attribute
and then declare its usage with the <attribute> element tag with a type
attribute that references its definition.
The copyright attribute was defined within the "Book" ElementType. Thus,
different element types can declare attributes with the same name but with
potentially different meaning.
<AttributeType> elements can also be declared globally by placing them
outside of an ElementType. Then, multiple elements can share a common attribute
type without having to redeclare the AttributeType inside each ElementType.
A content model
indicates what an element can contain.
In the above example, a "Book" element is defined to contain a sequence of
"title", "author" and "pages" elements. Thus, a valid XML file might look like:
<book xmlns="x-schema:book-schema.xml">
<title>Cooking 101: A Cookbook for Beginners</title>
<author>Joseph Cook</author>
<pages>392</pages>
</book>
|
If the book element contains any elements other than
those specified (illustrator for instance) the XML document will not validate.
The book content is a closed model due to its model="closed"
attribute.
Open Content Models
Open content models enable additional elements and/or attributes to exist
within an element without having to declare each and every element in the XML
Schema. Content models are open by default.
This is now a valid XML document:
<book xmlns="x-schema:book-schema.xml" xmlns:new="urn:new-namespace">
<title>Cooking 101: A Cookbook for Beginners</title>
<author>Joseph Cook</author>
<pages>392</pages>
<new:illustrator>John Doe</new:illustrator>
</book>
|
A few rules apply to open content models:
-
You can't add/remove content that will break the existing content model. For
example, since <book> is defined as a sequence, the valid data must
provide that exact sequence first, before adding any "open" content. Removing
the <pages> element or providing two <title> elements next to each
other would cause validation to fail.
-
You can add undeclared elements as long as they are defined in a different
namespace.
-
You can add other elements declared in the same schema. For example, a second
<title> element after the <pages> element will validate.
Element Content
An element can contain text, other elements, a mixture of text and elements, or
nothing at all. The content attribute specifies
what the element can contain.
Here's an example and the valid content values.
<ElementType name="title" content="textOnly"/>
|
Value
|
Description
|
textOnly
|
The element can contain text but no sub elements.
|
eltOnly
|
Element can contain sub elements only.
|
empty
|
Text and sub-elements are not allowed.
|
mixed
|
Both text and sub-elements are allowed.
|
Element Occurrences
The minOccurs and maxOccurs
attributes specify how many times an element can appear within another element.
<element type="item" maxOccurs="*">
|
MaxOccurs specifies the maximum number of times a sub-element can appear. Valid
values are integers and "*", which means that an unrestricted number of
elements may appear. The default value is "1". However, when content="mixed",
the default value is "*".
You can specify a minimum number of times a sub-element may appear with
minOccurs. To make a sub-element optional, set minOccurs to "0". The default
value is 1.
These attributes can be used for both element and group declarations.
Sub Element Order
The order attribute specifies if sub-elements must
appear in a certain order, and if only one sub-element of a set can appear.
Legal values are seq, one and many.
The seq value indicates that sub-elements must appear in the order listed
in the schema (title, author, pages). For example:
<ElementType name="Book" order="seq">
|
The one value specifies that only one sub-element can be used from a list
of sub-elements. For example, to specify that an "Item" element may contain
either a "product" element or a "backOrderedProduct" element, but not both:
<ElementType name="Item" order="one">
<element type="product" />
<element type="backOrderedProduct" />
</ElementType>
|
The many value indicates that the sub-elements may appear in any order,
and in any quantity. The default value for order is "seq" when the content
attribute is set to "eltOnly", and the default is "many" when content is set to
"mixed".
Element Grouping
The group element lets you specify rules for a
specific set of sub-elements. To indicate that the "Item" element has either a
"product" or a "backOrderedProduct" element, and then a "quantity" and "price",
you can use the following XML:
<ElementType name="Item">
<group order="one">
<element type="product" />
<element type="backOrderedProduct" />
</group>
<element type="quantity"/>
<element type="price"/>
</ElementType>
|
Attributes
Attributes are different than elements and the rules that apply to elements do
not apply to attributes. Also, different element types can have attributes with
the same name but the attributes are independent and unrelated.
You can specify whether an attribute is required or optional and
you can limit its value to a small set of strings. You can also indicate a
default value to be used if the attribute is omitted from an element.
'Make the attribute required.
<AttributeType name="shipTo" dt:type="idref" required="yes"/>
'Limit the attribute's values to high, medium and low
<AttributeType name="priority" dt:type="enumeration" dt:values="high medium low" />
'Provide a default value of 1
<AttributeType name="quantity" dt:type="int">
<attribute type="quantity" default="1"/>
|
Unlike a DTD, XML schemas let you specify a data type
for an element or attribute. To use a data type, your schema must include the
datatypes namespace:
<Schema name="myschema"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<!-- ... -->
</Schema>
|
Data types can be specified on <ElementType> and <AttributeType>
tags using one of the following syntaxes:
<ElementType name="pages" dt:type="int"/>
<ElementType name="pages">
<datatype dt:type= "int"/>
</ElementType>
|
Although schema allow for specifying data types, IE's XML parser does not fully
support them. You can read more about data types by visiting
Microsoft's web site.
XML Schemas are extensible. They are built on an open content model. You are
free to add your own elements and attributes to XML Schema documents.
For example, you could add additional constraints to a "pages" element. This
sample declares the "pages" element. Extended tags from the "myExt" namespace
augment this information with an added rule that books must have a minimum of
50 pages and a maximum of 100 pages.
<ElementType name="pages" xmlns:myExt="urn:myschema-extensions">
<myExt:min>50</myExt:min>
<myExt:max>100</myExt:max>
</ElementType>>
|
Although the XML parser will not use the additional "myExt" constraints when
validating the XML data, your application can.
Schemas can use other schemas allowing you to build a new schema from other
existing ones. Say you already have a schema that defines an "Address" element.
Using namespaces, you can use that schema in your new schema by adding a
namespace declaration for it.
For an example, see the new schema reference in the sample under
Open Content Models. You can also read about namespaces in my
XML tutorial or on Microsoft's
site.
|