InformationWeek: The Business Value of Technology

InformationWeek: The Business Value of Technology
InformationWeek - Our New iPad App
Labs
May 25, 1998


The Making Of A Markup Language


Though few tools or applications now support XML, emerging standards will eventually usher in a new era of Web automation and interoperability

By Jason Levitt

W hen was the last time a markup language did anything useful for your company? HTML can paint some pretty pictures in your Web browser, but despite the addition of technologies such as Dynamic HTML, Cascading Style Sheets, and a raft of new tags, serious applications are moving a lot faster on the Web-server side than on the browser, whic h is burdened by legacy markup constructs and a static implementation of markup tags.

And what about Standard Generalized Markup Language? SGML, a pre-Web technology, became a standard in 1986 and has since seen widespread use in encoding documents of all types, but it hasn't generated nearly as much excitement in the business marketplace as its most recent offspring, XML.

You may have heard of XML, but chances are you haven't seen any yet. That's because XML, the Extensible Markup Language, is still in an embryonic period: Standards are being negotiated and tools are being created. Companies such as Adobe Systems, IBM, Microsoft, Netscape, and Sun Microsystems are proposing XML-based standards and, in a limited fashion, are beginning to support XML in their existing products. The result is that, even without current substantial implementations, the impact of XML on the Web, and on businesses in general, is clearly going to be far-reaching.

Reasons To Believe
There are g ood reasons for that. XML, which is really just a language for creating other markup languages, makes it easy to create structured documents, easily readable by humans, with tags that describe the content of the document. These documents can be exchanged easily and understood by properly written applications.

Search engines, intelligent agents, EDI applications, database repositories, query systems, and online catalogs are just a few of the many applications that will greatly benefit from XML's structured document definitions. Even outside the Web, in applications such as document management and the movement of legacy data from one system to another, applications will be able to harness the advantages of XML.

Two things are slowing down the adoption of XML on the Web. First, XML is a very new standard. It was just granted formal recommendation status in February by the World Wide Web Consortium, and it's still waiting on several infrastructure technologies to make it use ful. Second, widespread development won't be galvanized until standardized markup languages are agreed upon and developed for vertical markets (see sidebar, " The Need For Standard Markup Languages ").

How XML Works
The best way to get a feel for the power of XML is to consider a useful example. The language is well-suited for projects with large quantities of similarly structured data, such as the universe of articles published by the various magazines owned by CMP Media Inc., InformationWeek 's parent company. In our fictitious example, CMP Media will create an in-house publishing system that will work with all of its magazines. In particular, CMP wants to use XML to track articles, manage copy flow, and easily create a rich, searchable database of articles. Fortunately, XML is a great choice for designing the document structures for all of these tasks. Note that XML is not a programming language and therefore does not create executable binaries of any type. XML simply lets you define your tags and the relationship between them. The XML-encoded articles will have a rich structure that will make them easy to track, format, and manipulate. We'll have to create applications (probably written in Java or C++) that understand how to interpret and process our XML files.

Inside XML
XML files generally have two parts. One part is the XML tags and content itself, the other is the Document Type Definition that defines the tags and their relationships. The DTD can reside in the same file as the XML source or it can be in a separate file. The tables in the accompanying PDF file show a sample XML file for InformationWeek 's InternetView column. The first box of sample code shows the file column.xml, a recent InternetView column encoded in XML. (The body of the column is abbreviated in this example).

The XML file is easily readable by humans (who can read English). I made up the various marku p tags, such as <author> and , and it is clear what they refer to. Perhaps the only difficult lines are the first two, which declare that the file is XML 1.0 compliant and that it is not a standalone file (it depends on the file column.dtd). The second line actually defines the location of the DTD; in this case, it's in the file named column.dtd, which is displayed just below the XML code.

DTD files define the tags and structure of the associated XML file but, unlike XML files, they are clearly not meant to be read by humans. The first ELEMENT statement defines the order that the other elements must appear in the XML file.

In this case, the order is COLUMN, HEADLINE, AUTHOR, AUTHORPHOTO, COLUMNBODY, and SIGNATURE. The COLUMN tag also has three attributes: the tagline, which tells us it's the InternetView column; the current version of the column; and the date of the issue in which the column will appear. Each of the attributes is of type CDATA (Character Data)--a text string that does not get interpreted by the XML parser.

All of the other elements are of type PCDATA (Parsed Character Data), and may contain HTML tags or other markup information. The XML parser will check the tags in the PCDATA string to ensure that they adhere to XML syntax rules (all tags must have a closing tag, for example).

The DTD file in this example will give our custom XML-aware application a clear view of the document structure and, through a definition of the tags, a meaning to its contents, but it doesn't give any clue as to the format of the document. There isn't any information, such as you might find in a Microsoft Word file, about what font is being used, which characters are bold, or how they are justified in the document.

These types of display issues will be handled in the near future by style sheets. A standard, called Extensible Style Sheet Language, is expected to be issued as a W3C Proposed Recommendation in about a year. XSL will be based on the SGML style sheet standard, DSSSL, and will be compatible with CSS, the HTML style-sheet standard. Style sheets will typically be kept in external files and referenced from within an XML file with a line such as this:
< ?xml-style sheet href="column.xsl" type="text/xsl" ?>
The diagram includes a simple XSL style sheet that could be used to display the column in a Web browser. This XSL style sheet uses CSS flow conventions and HTML tags.

In general, XSL style sheets will be much more complicated than this example and will rarely, if ever, be written completely by hand. Tools such as ArborText's XMLStyler program offer a graphical user interface for creating XSL style sheets.

Standards groups are working to make XML as Web-friendly as possible. It's even possible, though it involves awkward syntax, to embed JavaScript scripts in XML files. The preferred way is to put the JavaScript in external files and then refer to the scripts by using the SRC attribut e of the HTML <script> tag in your XML files.

Unfortunately, scripting won't be practical in XML documents until the DOM (Document Object Model) and Style Sheet standards for XML are formalized. For the sake of simplicity, the DTD used in the example in this article applies only to a single type of document. If we were creating a real XML solution for our publishing environment, it would be best to have a single, generalized DTD that encompassed all the types of articles and documents to be produced, instead of a DTD for each type of article.

Not Much Choice In Tools
The newness of XML itself and the dearth of standardized markup languages written using XML. make for slim pickings in the XML tools and application market. The interesting activity is occurring in the development tools sector, though, where XML parsers, editors, and toolkits are starting to arrive.

Still, it's nothing yet like the deluge of authoring products that swamped the HTML market- -but give it six months or so. Most of the action has come from companies with solid existing SGML products such as ArborText and Microstar, though these initial offerings aren't going to impress developers.

ArborText Inc.'s XMLStyler is currently the only XSL style-sheet editor, but the final details of the XSL standard are probably nine months or more away from being finalized. Thus, XMLStyler is more of an exercise in style-sheet vocabulary than a serious development tool.

Similarly, Vervet Logic's XML <pro>, currently in testing, is a simple GUI that lets you edit the document tree-structure of a DTD, as well as validate it against an XML file. XML <pro> is a great alternative to all those Java-based command-line XML parsers out there and is a good tool for introductory dabbling with XML and DTDs.

Microstar Software Ltd.'s Near and Far Designer 3.0 is perhaps the most mature product in the group, since it started life as an SGML DTD editor. In its 3.0 incarnation, it can still validate and help you build SGML DTDs; it can also help you build XML DTDs as well as translate SGML DTDs to XML DTDs and provide error reporting in the process. This makes it a particularly valuable tool for SGML shops that need to start experimenting with moving DTDs to XML.

DataChannel Inc.'s XML Developer's Kit, though aimed at its RIO product, which is an Internet Explorer 4.0 channel manager, contains a number of useful components for XML developers, including the validating DataChannel XML Parser (DXP), the XML Generator, which can batch translate comma-delimited data files into XML, and the DOM-Builder, an XML editor based on the proposed XML DOM.

On the browser front, Microsoft has delivered an XML parser for Internet Explorer 4.0 ( www.microsoft.com/xml ), and it is possible to write XML-aware applications that make use of IE 4.0's DOM. Netscape has indicated that Communicator 5.0 (Mozilla) will include an XML parser as well as support for XSL and RDF ( www.mozilla.org/rdf/doc/xml.html ).

Conclusion
XML is going to be a very important technology for Web-based applications of all types. All that's needed are some related infrastructure standards and some robust standard markup languages for vertical applications. For the time being, development tools are starting to arrive--and with them, the usual excitement that follows significant developer product releases, and, hopefully, new XML applications.


Back to Labs

Send Us Your Feedback

Top of the Page


Home | Career | Financials | NewsFlash
Resource Centers | Shop Talk | Search

Get InformationWeek Daily

Don't miss each day's hottest technology news, sent directly to your inbox, including occasional breaking news alerts.

Sign up for the InformationWeek Daily email newsletter

*Required field

Privacy Statement



This Week's Issue

Technology Whitepapers

Featured Reports







Video