February 8, 1999
XML: More Than Just A Quick Fix
Extensible Markup Language is seen as a universal object model that will enhance Web development and simplify application integrationBy Don Kiely
xtensible Markup Language is finding many uses in IT departments--driven largely by an insatiable need for new applications to drive electronic commerce. XML will help developers build and deploy sophisticated Web applications faster. XML will also hasten enterprise integration efforts so important to supply chains and other business collaboration initiatives.| Related links: |
|
To view a PDF file, download the Adobe Acrobat Reader |
XML is a metalanguage that describes the way data is formatted and exchanged between clients and servers on an IP network. It's viewed as an extension to HTML, providing many more programming tags that can be used to format, structure, and search upon information contained in an application.
XML is specifically optimized for the Web. And while its adoption is being driven by Web development, XML will likely have applicability in systems beyond the Internet, such as broad-based application development tools and systems-management and mainframe applications.
Currently, XML is a proposed standard under the auspices of the World Wide Web Consortium. It's derived from the long-standing Standard General Markup Language specification. SGML was too complicated for Web documents, so a more efficient and network-aware specification was needed.
XML requires greater functionality to reach its full potential as a data standard and HTML killer. That's coming in the form of some complementary standards.
XML is winning acceptance from a range of vendors, such as IBM, Microsoft, Oracle, and Sun Microsystems. They're adding XML support to development tools, Web utilities, databases, and business applications. In the past quarter, two companies released XML servers: Object Design and Bluestone Software (see sidebar story, "Two Vendors Debut XML Servers").
Data About Data
By its nature, HTML is for formatting documents, primarily for viewing on the Web. XML is good at presenting data in a form that's easy to understand--unlike most computer languages and applications. If an application is told that a document in the browser is an invoice, for example, the application will have a hard time parsing out all the various elements for further processing. There just isn't a standard way on the Web to identify and present an invoice and its individual pieces of data, or relate it to other data.
But data in an XML document can be extracted, analyzed, and presented in any format by any application that can parse XML data. For example, the figure on the right displays the text of the play Hamlet marked up with XML to identify the type of content that each text string contains. Another application can read this data and produce an annotated script, and another could analyze speech patterns while ignoring the character descriptions.
The Web is essentially a text-based medium with cumbersome adaptations for objects such as images and sound files. Text marked up with XML has the benefit of being cross-platform and simple to understand. XML also supports flexible data structures and international character sets (see chart below).
| Text Benefits Of XML Over Binary File Formats |
| Cross-platform: Text is one of the few data formats universally supported on computer platforms, even if character sets vary. |
| Simplicity: XML can organize data while keeping the data in human-readable form. It also lends itself to common text editors for viewing and editing, and it is possible to use understandable strings for tags. |
| Flexible data structures: XML is ideally suited to representing tree, hierarchical, and nested data well, including the rooted tree structure of HTML; relational data, mapping data into a tree and displaying rows as subtrees; and graphs, when used with the proposed Resource Description Format. |
| International character sets: XML 1.0 is based on ISO-10646, the Unicode character set, so that virtually any character in use in any language is legal within an XML document. Data: InformationWeek |
XML is useful as a standard way to describe data, but is far from revolutionary by itself. When XML is combined with two other emerging standards, the Extensible Style Language (XSL) and XML Linking Language (XLL), these three specifications define a rich markup language that is flexible, extensible, and keeps separate a Web page's data, formatting, and navigation. This separation makes it easier for developers to create and change applications as system and business needs dictate.
XSL is a superset of the Cascading Style Sheet, a proposed standard to allow flexible formatting of HTML files. What CSS does for HTML code, XSL does for XML, plus a lot more. XSL acts as almost a full processing language for the data contained within XML pages. It provides simple Select Case statements (xsl:choose), a programmatic construct that executes codes based on a specific condition. It also supports conditional branching (xsl:if), and condition and loop testing (xsl:for-each), which repeats blocks of code as long as a particular condition is true.
There are two parts to the XSL specification, one for transforming XML documents and another for formatting semantics. Transforming XML documents involves associating patterns with the source tree contained within the XML document, using XSL templates. This is the XSL feature that provides almost complete flexibility to convert data into any format, including an HTML-formatted Web page.
The other part of the XSL spec is used to provide formatting objects with attributes you can set for flexible formatting. XSL's formatting objects work much the same as style sheets in many word processors, using what the specification calls a formatting vocabulary. Each object represents a particular behavior, such as the construction of a page number with the page-number object or the formatting of a numbered list with the list flow object. Each object has a number of properties for fine-tuning the formatting.
XML and XSL together can provide what amounts to a customizable HTML. This lets Web designers define their own tags, rather than rely on Microsoft and Netscape to extend the language as they see fit, while keeping data separate from formatting.
But a major feature of the Web is navigation through hyperlinks, and nothing in either XML or XSL provides them. That's where XLL comes in. The two parts of this specification provide Web pages with far greater linking capabilities between documents and to parts of a document than was possible with HTML. The first part, XLink, gives documents advanced linking capabilities, such as links to multiple destinations, bidirectional links, and links with custom behaviors.
The other part of the XLL spec, XPointer, provides links to locations within documents even without special tags and a wider span of addressing to page elements, strings, and arbitrary selections. Web developers can use these features to exert more control over the relationships between documents.
Unfortunately, the XSL and XLL specifications aren't complete. But the combination of these three specifications can produce HTML documents, other well-formed XML documents that process the source data into different forms, or almost any other document type.
XML Offshoots
What ensures XML's long-term viability is the groundswell of uses to which it is being put: in corollary standards, as the foundation for other standards, and in support of various software products. It seems as though each week brings another substantial XML announcement.
IBM, Unisys, and other partners recently unveiled their XML Metadata Interchange proposal. XMI integrates XML with the Uniform Modeling Language, fast becoming the standard means of modeling software projects, and the Object Management Group's Meta Object Facility repository standard. XMI will let developers of distributed systems share object models. The biggest advantage of this is to enable collaborative application development over the Internet, with programmers sharing information and metadata.
One problem inherent in XML is the lack of a namespace--boundaries within which a tag name must be unique. For example, the tag
The solution is a proposed XML Namespace specification that describes how prefixes can uniquely identify a tag as belonging to a particular XML source document. Each such prefix is supported by descriptive metadata that gives the location of each source document's tag set.
This example, taken from the XML Namespaces working draft, shows how the namespace can be changed midway through a document using the xmlns attribute of the tag name:
|
This is a funny book! |
Two or more XML namespaces, bk or isbn, can also be mixed together by prefacing each tag:
|
|
Application Integration
These examples just begin to show emerging standards and technologies based on XML. XML's real benefit comes with integrating applications by making it easy to share data. XML has the potential to become the de facto standard for communicating content and objects on the Internet. Since XML describes data, collections of data become almost trivial to move between applications. No longer will a developer have to move data between formats on different platforms. Instead, data can be output in XML and input into XML-aware applications.
This leads to more flexible applications. Data can be used in any kind of application, because each application no longer needs to understand another application's format. Years ago, a friend of mine gave up on a project of compiling a variety of data formats for proprietary PC file formats. With such a resource, a developer could easily import and export data, but my friend couldn't pry the information out of vendors. XML could provide such a bridge between formats.
Another benefit of XML, particularly for Web applications, is the ability of applications to manipulate data locally rather than just formatting and presenting it. Just by creating different XSL documents, data can be filtered and refined in different ways by the client application. This local computation saves bandwidth resulting from repeated calls to the server, since raw data, marked up with XML, is transmitted once to the client for processing.
These benefits don't mean that XML is a panacea. Its biggest benefits depend on cooperative, effective, and standard data vocabularies for industries and data uses. If one bookseller industry group defines a piece of data as "author" and another as "writer," applications will be no better off than they are now.
Several such industry-specific efforts are under way now. The Australia-New Zealand Land Information Council is working to describe geographic and spatial data types for use in XML-based systems. Educom's Instructional Management Systems Project has brought together a number of educational organizations to create a format for exchanging learning materials over the Internet. And the Utah Electronic Law Project wants to create standards for exchanging legal documents over the Net. There's also a version of XML under development for the chemicals industry.
The potential benefits of XML and these vertical industry initiatives are great, and it's quite likely that other industry sectors will follow suit as XML picks up steam.
Don Kiely is director of software technology at Information Insights, a Fairbanks, Alaska, consulting company specializing in application development. He can be reached at donkiely@computer.org.
Read sidebar story, "Two Vendors Debut XML Servers"
Illustration by John Bleck
Upcoming Events
Live Events
- The Language of UX: Beyond Buzzwords -
- I Can See Clearly Now - E2 Conference Boston
- Learn how to enage customers through mobility - Mobile Commerce World - Mobile Commerce World
- The E2 Social Business Leaders - E2 Conference Boston - E2 Conference Boston
- The A-to-Z of Building Your Big Data Initiative - E2 Conference Boston - E2 Conference Boston
This Week's Issue
Current Healthcare Issue
In this issue:
Subscribe Now
- Healthcare CIO 20: Innovation is tough amid today's regulatory checklists. These leaders are getting it done.
- Lessons Learned: Boston area CIO John Halamka reflects on the marathon bombing
- And much more!
- Read the Current Issue
Current Education Issue
In this issue:
Subscribe Now
- Hacking Higher Ed: The cybersecurity challenge on college campuses lies as much with the students as with malicious outsiders.
- When Education Gets Too Virtual: Students can use technology to undermine the integrity of education.
- And much more!
- Read the Current Issue













