An Extensible Markup Language (XML) Tutorial

This tutorial covers the basics of XML and many of the common features and terms associated with XML. After completing this tutorial you should have a general understanding XML and how and why to use it.

Discussions contained herein focus on Microsoft's implementation of XML. Thus, most samples require version 5.0 or later of their Internet Explorer browser. To learn more about Microsoft's XML products, visit their site.

What is XML

XML stands for Extensible Markup Language.

XML was designed to describe structured data. Its a markup language similar to HyperText Markup Language (HTML). Both XML and HTML are subsets of Standard Generalized Markup Language (SGML).

Unlike HTML, XML tags are not predefined. You make up your own unlimited set of tags. This is why it is extensible. XML is a meta-markup language (i.e. it conveys information about itself) so it is self-describing. Since you make up your own tags, XML uses a Document Type Definition (DTD) to describe its data to applications that use it.

XML was designed to describe data and focus on what the data is. HTML was designed to display data and focus on how the data looks. XML data can be viewed in a browser or it can be passed to other applications for processing and viewing.

XML standards are defined by the World Wide Web Consortium (W3C), ensuring that XML will be uniform and independent of applications or vendors. The W3C site is the most complete reference of XML available.

How can XML be used?

Data Separation
XML can keep data separated from your HTML. HTML pages are used to display data. Data is often stored inside HTML pages. With XML this data can be stored in a separate XML file. Thus, you can concentrate on using HTML for formatting and display, and be sure that changes in the underlying data will not force changes to any of your HTML code.

XML data can also be stored inside HTML pages as Data Islands. You can still concentrate on using HTML for formatting and displaying the data.

Different computer systems typically contain data in incompatible formats. This makes exchanging data between such systems difficult. Converting the data to XML greatly reduces this task since the data can be read by different types of applications. XML can also be used to store data in files or databases.

What is an XML element?

XML is a set of rules for creating semantic tags used to describe data. An XML element is made up of a start and end tag with data in between. The tags describe the data. The data is called the value of the element. For example, this XML element is a <director> element with the value "Bill Smith."

<director>Bill Smith</director>

The element's name is "director" and allows you to mark up the value "Bill Smith" so you can differentiate it from another similar piece of data. Consider another element with the value "Bill Smith".

<actor>Bill Smith</actor>

Since each element has a different tag name, you can see one refers to Bill Smith, the director, while the other refers to Bill Smith, the actor.

What is an XML document?

A basic XML document is simply an XML element that can - but might not - include nested XML elements.

Here is an example of an XML document:

<?xml version="1.0"?>
  <text>Don't forget to buy milk on the way home.</text>

The first line of the document is the XML declaration and should always be included. It defines the XML version of the document. In this case the document conforms to the 1.0 specification of XML:

<?xml version="1.0"?>

The next line defines the first or root element of the document:


The following 4 lines define 4 child elements of the:

<text>Don't forget to buy milk on the way home.</text>

The last line defines the end of the root element:




XML documents must adhere to the following strict syntax rules.

  • XML elements must have a closing tag
    Some HTML elements, such as the paragraph (<p>), don't need a closing tag. However, all XML elements must have a closing tag.

  • Empty Elements
    XML allows empty elements with this shorthand notation:

    <title></title> Normal notation
    <title/>          Shorthand notation

  • Tags must be properly nested
    Overlapping elements are not allowed. An element must have a closing tag before the next element's starting tag.

    <b><i>This text is bold and italic</b></i>  This is incorrect
    <b><i>This text is bold and italic</i></b>  This is correct

  • XML tags are case sensitive
    The following specify different elements:

    <City>     <CITY>     <city>

    <City>This is incorrect</city>
    <city>This is correct</city>

  • XML documents must have a root tag
    All XML documents must contain a single, unique tag pair to define the root element. All other elements must be nested within the root element. All elements can have sub (child) elements. Sub elements must be in pairs and correctly nested within their parent element.


  • Attribute values must always be quoted
    An element can optionally contain one or more attributes in its start tag. An attribute is a name-value pair separated by an equal sign (=). Attribute values must always be quoted.

    <CITY ZIP="01085">Westfield</CITY>

    ZIP="01085" is an attribute of the <CITY> element.

    Attributes are used to attach additional, secondary information to an element. Attributes can also accept default values, while elements cannot. Each attribute of an element can be specified only once, but in any order.

    <message date="12/11/99"> This is correct
    <message date=12/11/99>   This is incorrect

    <message ID="100"> The ID attribute can be used to identify which message
    <message ID="101">


Vaild XML Documents

A Valid XML document is a Well Formed XML document that adheres to the rules of a Document Type Definition (DTD). A DTD defines the legal elements of an XML document. DTDs can be inline in your XML document or externally referenced.

This XML document has a reference to an external DTD.

<?xml version="1.0"?>
<!DOCTYPE message SYSTEM "InternalMessage.dtd">
  <text>Don't forget to buy milk on the way home.</text>

Read more about document type definitions.


Data Islands

A data island is an XML document that exists within an HTML page. It lets you script against the XML document without having to load it through script or through the <OBJECT> tag. Almost any well-formed XML document can be inside a data island. Data islands can be inline or external.

The <XML> element marks the beginning of the data island. Its ID attribute provides a way to reference the data island. The SRC attribute is used to identify the external XML file.

<XML ID="XMLID" SRC="customer.xml"></XML>

You can also use the <SCRIPT> tag to create a data island:

    <name>Bill Smith</name>

Here is a complete example of an inline data island bound to the HTML:


    <name>Bill Smith</name>
    <name>John Doe</name>
    <name>Lisa Longo</name>

<table datasrc="#XMLID">
<td><div datafld="name"></div></td>
<td><div datafld="custID"></div></td>


The <XML> tag's ID attribute is used to reference the data island in the HTML. Using HTML tags that can accept data source tags (i.e. bind the HTML to the XML data), you can easily format and display the XML data. This HTML page displays the XML data in a table.

The <table> tag uses the DATASRC attribute to refer to the inline XML data island whose ID attribute is XMLID. The <TD> element itself can't be bound to data but the <div> tag can. The DATAFLD attribute indicates which XML element to place in the cell of the table. As the XML is read, additional table rows are created for each element tagged with the <customer> tag.

The functionality within Internet Explorer to bind XML data to HTML is called the Data Source Object (DSO).



A namespace is a collection of names used as element or attribute names in an XML document. A namespace qualifies element names to make them unique on the Web to avoid conflicts between elements with the same name.

A namespace is identified by a Universal Resource Identifier (URI) which can be either a Uniform Resource Locator (URL) or a Uniform Resource Number (URN). It doesn't matter what the URI points to. URIs are used because they are globally unique across the Internet.

Namespaces can be declared explicitly or by default. With explicit declarations, you define a prefix to qualify elements belonging to that namespace.

Here's an explicit declaration which defines the "bk" and "money" namespace prefixes. The xmlns attribute is an XML keyword for a namespace declaration. Elements starting with "bk:" or "money:" are from the "urn:BookLovers.org:BookInfo" and "urn:Finance:Money," namespaces respectively.

Move the mouse over the XML for more information.

<bk:BOOK xmlns:bk="urn:BookLovers.org:BookInfo"
     <bk:title>A Suitable Boy</bk:title>
     <bk:PRICE money:currency="US Dollar">22.95</bk:PRICE>

Default declarations define a namespace to be used for all elements within its scope. No prefix is used. A namespace declared without a prefix becomes the default namespace for the document. All elements and attributes in the document that don't have a prefix belong to the default namespace.

<BOOK xmlns="urn:BookLovers.org:BookInfo">
   <title>A Suitable Boy</title>
   <PRICE currency="US Dollar">22.95</PRICE>


Viewing XML With Internet Explorer 5+

You can use IE5+ to view an XML document. To open an XML document, click on a link to an XML file, type its URL in the address bar, or double-click on an XML document in a folder.

When you display an XML document in Explorer, IE shows the document with its root element and child elements expanded. Use the plus (+) and minus sign (-) signs to the left of the XML elements to expand or collapse the element structure.

Note: If you are not using IE5+, all bets are off.

Try these files:


Cascading Style Sheets

XML's goal is to separate data from its presentation. So then, how do you display the data in a neat format? You can use Cascading Style Sheets (CSS) just as you would format HTML.

The CSS associates formatting properties with the XML tags allowing the CSS to decorate the existing XML tree structure. Problem is, forethought must be used when designing the XML tree structure so you can display it properly. This violates the idea of separation of data.

The solution is to use Extensible Stylesheet Language (XSL) instead. XSL lets you transform the XML tree into a new tree without changing the XML source. Then the XML can be displayed differently just by switching style sheets.

Read more about the extensible stylesheet language.


Putting it all Together

We saw how to use data islands to include XML data in your HTML page, how to display the data using cascading style sheets and how to qualify XML data with namespaces.

Using all of these features you can embed HTML tags into your XML data and format the XML data for display. Move the mouse over the XML for more information.

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="AllTogether.css" ?>
<COURSE xmlns:HTML="http://www.w3.org/TR/REC-html40">
   <title>Putting it All Together</title>
     <HTML:LI>Line one</HTML:LI>
     <HTML:LI>Line two</HTML:LI>
   <HTML:BR />
   <HTML:IMG src="MyImage.jpg" />

The special HTML namespace used has a predefined meaning in the browser. It instructs the browser to interpret any content in the HTML namespace as HTML rather than XML and be rendered as such.

Click the AllTogether CSS link to view it and notice it uses Media Styles. This allows you to specify one set of styles to be applied to online content and a different set to be used when IE prints the page.


About TheScarms
About TheScarms

Sample code
version info

If you use this code, please mention "www.TheScarms.com"

Email this page

© Copyright 2024 TheScarms
Goto top of page