Could IBM's Viper be the database world's holy grail?
For years, the big database vendors have said that their relational offerings also support unstructured XML data. And so they do. Kind of.
But IBM says its upcoming Viper release of DB2 will eliminate the need to force-fit unstructured data into some facsimile of the row-and-column format that relational databases can deal with. Instead, it will store both structured and unstructured data in their respective native formats and allow both types of data to be queried via SQL or Xquery statements.
So far, bridging that structured/unstructured divide has proven problematic, even for the vendors that claim XML support in their RDBMSes.
"Today DB2 has XML support, as does Oracle and Microsoft," said Janet Perna, general manager of IBM Software's information management division. "But we all treat [that data] as if it's relational. XML looks like a tree; it is a hierarchy, not a table."
Timing is everything. If IBM can deliver Viper, it could steal street cred from both database rivals, especially as Microsoft struggles to deliver its long-delayed SQL Server 2005, or Yukon, now due by year's end.
Vendor posturing aside, the beauty of a true hybrid database is that it would enable users to work with both types of data via SQL or Xquery requests. It could also open up the sometimes-arcane world of database applications to developers and ISVs with experience in other data types and applications.
"Viper is pretty huge for us," said Paul Chan, vice president of marketing at PureEdge, an electronic forms ISV that is working with IBM. "We've always had this vision that there would be infrastructure support for XML [in the database]." The big impact on PureEdge's customers is they can build applications that can be used in different ways. "Usually an application is built for one purpose but is later used for many. This will make it far easier to extend processes and the original applications," he noted.
Most database jockeys would be glad to get rid of "shredding"—the process of manipulating unstructured data to fit into a structured database. Or the practice of having the database glom unstructured information into what is sometimes called a Character Large Object or CLOB, also sometimes referred to as a BLOB or Binary Large Object. So the data is handled, but not natively.
For example, Redwood Shores, Calif.-based Oracle's database lets "you extract info out of the database and publish it as XML but doesn't store it that way—it stores it in a PL/SQL package. For my money, that's fine for most applications. Or you make it into a BLOB," said one large Oracle partner.
Sandeepan Banerjee, Oracle's director of product management for XML, says his company offers both a "hybridized relational model" modified to include XML constructs as well as shredding. A lot of the discussion around "who's more native" is beside the point, Banerjee said. "The proof is in what you can do with the database."
Customers using Oracle's XML capability have achieved 2,500 database transactions per second in production mode, he maintained.
But others say those techniques do not cut it.
"Shredding is really a hack," Chan said. "Just a way to get data in and out. It's not the same as doing it natively. Viper will be much faster, it will store documents in native format vs. putting them all into some sort of BLOB."
With a native XML data store like in Viper, the representation of the data as it flows from client to disk and back again remains in XML format—meaning that both the logical and physical data models are XML. This obviates the need for shredding.
A limited-beta version of Viper is in the hands of about 30 customers and partners, said Bob Picciano, vice president of database servers at IBM. Full beta is slated for the second half of the year and general availability sometime in 2006, IBM said.
The key is to offer fast, native support for XML, "the lingua franca" for unstructured data, said Perna. She acknowledges that a lot of that data is not originally in XML format, but at least can be mapped into it. "That is why we're enhancing DB2 with native XML support," she said.
Most estimate that 80 percent to 85 percent of corporate data is unstructured—think of all those Word or WordPerfect documents, e-forms, PDF files and content management systems out there. Of course, the remaining 15 percent to 25 percent of the information residing in databases or other repositories is also critical. A database that could truly marry the two worlds so that they could be managed and accessed uniformly is, to many, a database nirvana—one that would enable database administrators to continue to use their skills, and those with XML expertise to do likewise.