Another Introduction to XML
by Thomas Erl

No doubt you've heard of XML, but how much do you understand about this technology? Some industry analysts have called it "revolutionary" and others have gone as far as to compare the importance of XML to the invention of the Internet itself. Whatever the case, it is true that XML is both a significant innovation and a powerful technology. It has caused a major shift in the eBusiness world, reshaping how applications are built and how data is managed.

To appreciate the meaning of XML, we first need to take a quick look back at what motivated its creation. In the mid-seventies Charles F. Goldfarb introduced the world to the Standard Generalized Markup Language (SGML). This hugely successful meta language established an international standard for data representation. SGML empowered organizations to take control of corporate intelligence, and evolved into a universal format for information exchange.

Over a decade later, Tim Berners-Lee conceived the World Wide Web. Soon thereafter, he would go on to found the World Wide Web Consortium (W3C). One of the first initiatives of this organization was to create a formal specification for the Hypertext Markup Language (HTML), a language based on SGML.

HTML provides a syntax used to describe the formatting and layout of document text. Its compact size and overall simplicity allowed it to become the standard document format for Web publishing, and perhaps the most widely used technical language in the world. In the mid-to-late 90's it became evident that the Internet would be used for more than just document publishing. The advent of e-Business identified a compelling need for a standard business-centric data representation format. The W3C responded by once again by turning to SGML. The result was the Extensible Markup Language. A meta language intended to supplement HTML's presentation features with the ability to describe the nature of the information being presented.

Without XML, information passed over the Internet has little meaning or context beyond its presentation value. XML adds a layer of intelligence to information, proportional to the intelligence with which it is applied. This layer can be extended throughout an organization, and beyond.

A good point of reference for learning about XML is the HTML language. HTML allows us to describe how information should be rendered by a Web browser. Therefore, an HTML document cannot tell us anything about the nature of the data being displayed. A financial report may be clearly formatted and displayed by an HTML document, but the data being presented has no underlying meaning - it is, in effect, nothing more than a picture.

Why is this a problem? Let's say company A and company B want to do business over the Internet. If company A sends company B the results of its financial report as an HTML document, someone at company B would have to read and interpret the document ("look at the picture") in order to further use this information. This lack of "information quality" limits its usefulness and severely inhibits the potential of the Internet as a mechanism for information sharing. This is where XML comes in.

XML solves this problem by allowing us to supplement content with "meta information," self-descriptive labels for each piece of text that goes wherever the document goes. This turns each Web document into a self-contained, mini-repository. If company A sends company B its financial report using XML, company B can:

• programmatically manipulate the report's data
• import the report data into a database
• store the report within its corporate document-set
• create different views of the report by sorting and filtering the data
... and so on.

As with HTML, XML is implemented using a set of elements. Unlike HTML, however, XML elements are not predefined. They can be customized to represent data in unique contexts. A set of related XML elements can be classified as a vocabulary. Vocabularies can be created to describe specific types of business documents, such as invoices or purchase orders. An instance of a vocabulary is therefore called an XML document. An XML document is the most fundamental building block of an XML architecture.

Organizations that need to exchange information can agree on a standard set of vocabularies. Alternatively, they can enlist a transformation technology to dynamically translate vocabulary formats.

Vocabularies can be formally defined using an XML schema language. The same way database schemas establish a structural model for the data they represent, XML schemas define the structure of XML documents. They protect the integrity of XML document data by providing structure, validation rules, type constraints and inter-element relationships. In other words, XML schemas dictate what can and cannot be done with XML data.

Numerous XML schema languages exist. The two most common are explained in the "Document Type Definitions (DTD)" and "XML Schema Definition Language" tutorials, following this section.

XML documents are generally manipulated using tree-based, event-based or class-based interfaces. The W3C provides a standard tree-based API called the Document Object Model (DOM). The most popular event-based API is the Simple API for XML (SAX), and most development platforms offer proprietary data binding APIs that supply a class-based interface into XML documents.

Vendor-specific implementations of DOM and SAX APIs can vary in their compliance to the DOM and SAX standards. Some that do comply further increase standard functionality by adding proprietary extensions. Using a compatible programming language, you can interact with the parser's API to manipulate XML documents in many different ways.

The Document Object Model expresses an XML document using a hierarchical tree view. Each branch of the tree represents an element in the hierarchy. The DOM classifies these elements as nodes, and the API provided by the DOM is also referred to as the node interface . Though the use of DOM-compliant APIs is very common, they can introduce some performance challenges. The API loads the entire XML tree view into memory, which can consume a significant amount of resources when processing larger sized documents.

The event-based API provided by SAX establishes a linear processing model that notifies the application logic of certain events prior to delivering the data. This approach is very efficient, and addresses many of the performance concerns of DOM. The SAX and DOM APIs complement each other and collectively provide a flexible programming model for XML.

Data binding APIs are a departure from the structure-oriented nature of DOM and SAX. They allow for a data-centric programming approach, where business classes are provided as the interface into XML document data. Many variations of data binding APIs exist, each with a unique feature-set.

Let's take a brief look inside a simple XML document. The first line of markup you will encounter is the XML declaration. It establishes the version of the XML specification being used:

<?xml version="1.0"?>

The part of a document within which data is represented is considered the document instance. It consists of a series of elements that tag data values with meta information.

An XML document instance orders its information into a hierarchical structure, defined by parent-child relationships between elements. Typically, the parent element establishes a context that is inherited by the child element.

In the example below, for instance, the book element has two child elements, title and author:
  <title>Joy of Integration</title>
  <author>Joe Smith</author>

Individual elements can also have properties, known as attributes. Whereas a parent element can have multiple layers of nested child elements, it can only have a one-to-one relationship with an attribute.

In our example, we've added the category attribute to the book element:

<book category="Fiction">

To associate a document with a DTD, a separate declaration statement is typically required. Here we link the DTD to our XML document.

<!DOCTYPE book SYSTEM "book.dtd">

Finally, here's a look at the entire document we just built.

<?xml version="1.0"?>
<!DOCTYPE book SYSTEM "book.dtd">
<book category="Fiction">
  <title>Joy of Architecture</title>
  <author>Joe Smith</author>

The syntactical conventions introduced here form the basis for all specifications that exist as specialized implementations (or applications) of XML.

SOA Design Patterns by Thomas Erl
Foreword by Grady Booch
With contributions from David Chappell, Jason Hogg, Anish Karmarkar, Mark Little, David Orchard, Satadru Roy, Thomas Rischbeck, Arnaud Simon, Clemens Utschig, Dennis Wisnosky, and others.
Web Service Contract Design & Versioning for SOA by Thomas Erl, Anish Karmarkar, Priscilla Walmsley, Hugo Haas, Umit Yalcinalp, Canyang Kevin Liu, David Orchard, Andre Tost, James Pasley
SOA Principles of Service Design by Thomas Erl
Service-Oriented Architecture: A Field Guide to Integrating XML and Web Services by Thomas Erl
Service-Oriented Infrastructure:On-Premise and in the Cloud by Raj Balasubramanian, Benjamin Carlyle, Thomas Erl, Cesare Pautasso
Next Generation SOA:A Real-World Guide to Modern Service-Oriented Computing by Pethuru Cheliah, Thomas Erl, Clive Gee, Robert Laird, Berthold Maier, Hajo Normann, Leo Shuster, Bernd Trops, Clemens Utschig, Torsten Winterberg
SOA with .NET & Windows Azure: Realizing Service-Orientation with the Microsoft Platform by David Chou, John deVadoss, Thomas Erl, Nitin Gandhi, Hanu Kommalapati, Brian Loesgen, Christoph Schittko, Herbjorn Wilhelmsen, Mickey Williams
SOA Governance:
Governing Shared Services On-Premise & in the Cloud
by Stephen Bennett, Thomas Erl, Clive Gee, Anne Thomas Manes, Robert Schneider, Leo Shuster, Andre Tost, Chris Venable
SOA with Java by Raj Balasubramanian, David Chou, Thomas Erl, Thomas Plunkett, Satadru Roy, Philip Thomas, Andre Tost
Modern SOA Methodology: Methods for Applying Service-Orientation On-Premise & in the Cloud by Raj Balasubramanian, David Chou, Thomas Erl, Thomas Plunkett, Satadru Roy, Philip Thomas, Andre Tost
Cloud Computing: Concepts, Technology & Architecture by Thomas Erl, Zaigham Mahmood, Ricardo Puttini
Cloud Computing Design Patterns by Thomas Erl, Amin Naserpour

For more information about these books, visit:

Arcitura Education Inc.
Arcitura Education Inc. is a leading global provider of progressive, vendor-neutral training and certification programs, providing industry-recognized certification programs for a range of certifications.
For more information:
SOA Certified Professional (SOACP)
The books in this series are part of the official curriculum for the SOA Certified Professional program.
For more information:
Cloud Certified Professional (CCP)
The books in this series are part of the official curriculum for the Cloud Certified Professional program.
For more information:
Big Data Science Certified Professional (BDSCP)
The books in this series are part of the official curriculum for the Big Data Science Certified Professional program.
For more information: