by Thomas Erl
No doubt you've heard of XML, but how much do you understand about this technology? Some industry analysts have called it "revolutionary" and others have gone as far as to compare the importance of XML to the invention of the Internet itself. Whatever the case, it is true that XML is both a significant innovation and a powerful technology. It has caused a major shift in the eBusiness world, reshaping how applications are built and how data is managed.
To appreciate the meaning of XML, we first need to take a quick look back at what motivated its creation. In the mid-seventies Charles F. Goldfarb introduced the world to the Standard Generalized Markup Language (SGML). This hugely successful meta language established an international standard for data representation. SGML empowered organizations to take control of corporate intelligence, and evolved into a universal format for information exchange.
Over a decade later, Tim Berners-Lee conceived the World Wide Web. Soon thereafter, he would go on to found the World Wide Web Consortium (W3C). One of the first initiatives of this organization was to create a formal specification for the Hypertext Markup Language (HTML), a language based on SGML.
HTML provides a syntax used to describe the formatting and layout of document text. Its compact size and overall simplicity allowed it to become the standard document format for Web publishing, and perhaps the most widely used technical language in the world. In the mid-to-late 90's it became evident that the Internet would be used for more than just document publishing. The advent of e-Business identified a compelling need for a standard business-centric data representation format. The W3C responded by once again by turning to SGML. The result was the Extensible Markup Language. A meta language intended to supplement HTML's presentation features with the ability to describe the nature of the information being presented.
Without XML, information passed over the Internet has little meaning or context beyond its presentation value. XML adds a layer of intelligence to information, proportional to the intelligence with which it is applied. This layer can be extended throughout an organization, and beyond.
A good point of reference for learning about XML is the HTML language. HTML allows us to describe how information should be rendered by a Web browser. Therefore, an HTML document cannot tell us anything about the nature of the data being displayed. A financial report may be clearly formatted and displayed by an HTML document, but the data being presented has no underlying meaning - it is, in effect, nothing more than a picture.
Why is this a problem? Let's say company A and company B want to do business over the Internet. If company A sends company B the results of its financial report as an HTML document, someone at company B would have to read and interpret the document ("look at the picture") in order to further use this information. This lack of "information quality" limits its usefulness and severely inhibits the potential of the Internet as a mechanism for information sharing. This is where XML comes in.
XML solves this problem by allowing us to supplement content with "meta information," self-descriptive labels for each piece of text that goes wherever the document goes. This turns each Web document into a self-contained, mini-repository. If company A sends company B its financial report using XML, company B can:
• programmatically manipulate the report's data
• import the report data into a database
• store the report within its corporate document-set
• create different views of the report by sorting and filtering the data
... and so on.
As with HTML, XML is implemented using a set of elements. Unlike HTML, however, XML elements are not predefined. They can be customized to represent data in unique contexts. A set of related XML elements can be classified as a vocabulary. Vocabularies can be created to describe specific types of business documents, such as invoices or purchase orders. An instance of a vocabulary is therefore called an XML document. An XML document is the most fundamental building block of an XML architecture.
Organizations that need to exchange information can agree on a standard set of vocabularies. Alternatively, they can enlist a transformation technology to dynamically translate vocabulary formats.
Vocabularies can be formally defined using an XML schema language. The same way database schemas establish a structural model for the data they represent, XML schemas define the structure of XML documents. They protect the integrity of XML document data by providing structure, validation rules, type constraints and inter-element relationships. In other words, XML schemas dictate what can and cannot be done with XML data.
Numerous XML schema languages exist. The two most common are explained in the "Document Type Definitions (DTD)" and "XML Schema Definition Language" tutorials, following this section.
XML documents are generally manipulated using tree-based, event-based or class-based interfaces. The W3C provides a standard tree-based API called the Document Object Model (DOM). The most popular event-based API is the Simple API for XML (SAX), and most development platforms offer proprietary data binding APIs that supply a class-based interface into XML documents.
Vendor-specific implementations of DOM and SAX APIs can vary in their compliance to the DOM and SAX standards. Some that do comply further increase standard functionality by adding proprietary extensions. Using a compatible programming language, you can interact with the parser's API to manipulate XML documents in many different ways.
The Document Object Model expresses an XML document using a hierarchical tree view. Each branch of the tree represents an element in the hierarchy. The DOM classifies these elements as nodes, and the API provided by the DOM is also referred to as the node interface . Though the use of DOM-compliant APIs is very common, they can introduce some performance challenges. The API loads the entire XML tree view into memory, which can consume a significant amount of resources when processing larger sized documents.
The event-based API provided by SAX establishes a linear processing model that notifies the application logic of certain events prior to delivering the data. This approach is very efficient, and addresses many of the performance concerns of DOM. The SAX and DOM APIs complement each other and collectively provide a flexible programming model for XML.
Data binding APIs are a departure from the structure-oriented nature of DOM and SAX. They allow for a data-centric programming approach, where business classes are provided as the interface into XML document data. Many variations of data binding APIs exist, each with a unique feature-set.
Let's take a brief look inside a simple XML document. The first line of markup you will encounter is the XML declaration. It establishes the version of the XML specification being used:
The part of a document within which data is represented is considered the document instance. It consists of a series of elements that tag data values with meta information.
An XML document instance orders its information into a hierarchical structure, defined by parent-child relationships between elements. Typically, the parent element establishes a context that is inherited by the child element.
In the example below, for instance, the book element has two child elements, title and author:
<title>Joy of Integration</title>
Individual elements can also have properties, known as attributes. Whereas a parent element can have multiple layers of nested child elements, it can only have a one-to-one relationship with an attribute.
In our example, we've added the category attribute to the book element:
To associate a document with a DTD, a separate declaration statement is typically required. Here we link the DTD to our XML document.
<!DOCTYPE book SYSTEM "book.dtd">
Finally, here's a look at the entire document we just built.
<!DOCTYPE book SYSTEM "book.dtd">
<title>Joy of Architecture</title>
The syntactical conventions introduced here form the basis for all specifications that exist as specialized implementations (or applications) of XML.
- Inside XML Schemas
- SOAP in a Nutshell
- Transforming Data with XSLT
- Understanding DTDs
- Why SAX is Good for DOM
- What You Should Know about XPath
- An XHTML Primer
- XLink - Inside and Out
- Data Access with XQuery
- XSL versus CSS
- Another Introduction to XML
- Unifying Corporate Data & Documents
- Replacing HTML Documents with XML
- Meta-Enable Your Enterprise
- The XML Data Custodian
- Integrating XML into the Enterprise
- The Wireless Enterprise
Foreword by Grady Booch
With contributions from David Chappell, Jason Hogg, Anish Karmarkar, Mark Little, David Orchard, Satadru Roy, Thomas Rischbeck, Arnaud Simon, Clemens Utschig, Dennis Wisnosky, and others.
Governing Shared Services On-Premise & in the Cloud by Stephen Bennett, Thomas Erl, Clive Gee, Anne Thomas Manes, Robert Schneider, Leo Shuster, Andre Tost, Chris Venable
For more information about these books, visit: www.servicetechbooks.com