IBM says its upcoming Viper release of DB2 will bridge the structured/unstructured divide and allow both types of data to be queried via SQL or Xquery statements.

Barbara Darrow, Contributor

May 20, 2005

12 Min Read

Could IBM's Viper be the database world's holy grail?

For years, the big database vendors have said that their relational offerings also support unstructured XML data. And so they do. Kind of.

But IBM says its upcoming Viper release of DB2 will eliminate the need to force-fit unstructured data into some facsimile of the row-and-column format that relational databases can deal with. Instead, it will store both structured and unstructured data in their respective native formats and allow both types of data to be queried via SQL or Xquery statements.

So far, bridging that structured/unstructured divide has proven problematic, even for the vendors that claim XML support in their RDBMSes.

"Today DB2 has XML support, as does Oracle and Microsoft," said Janet Perna, general manager of IBM Software's information management division. "But we all treat [that data] as if it's relational. XML looks like a tree; it is a hierarchy, not a table."

 

>> The ability to truly marry the unstructured and structured worlds is, to many, a database nirvana.

 

Should IBM actually succeed with Viper, due next year, it would one-up database kingpin Oracle as well as Microsoft, which has pushed its SQL Server franchise successfully into small and midmarket companies, and into departments within the enterprise where Armonk, N.Y.-based IBM has its greatest strength.

Timing is everything. If IBM can deliver Viper, it could steal street cred from both database rivals, especially as Microsoft struggles to deliver its long-delayed SQL Server 2005, or Yukon, now due by year's end.

Vendor posturing aside, the beauty of a true hybrid database is that it would enable users to work with both types of data via SQL or Xquery requests. It could also open up the sometimes-arcane world of database applications to developers and ISVs with experience in other data types and applications.

"Viper is pretty huge for us," said Paul Chan, vice president of marketing at PureEdge, an electronic forms ISV that is working with IBM. "We've always had this vision that there would be infrastructure support for XML [in the database]." The big impact on PureEdge's customers is they can build applications that can be used in different ways. "Usually an application is built for one purpose but is later used for many. This will make it far easier to extend processes and the original applications," he noted.

Most database jockeys would be glad to get rid of "shredding"—the process of manipulating unstructured data to fit into a structured database. Or the practice of having the database glom unstructured information into what is sometimes called a Character Large Object or CLOB, also sometimes referred to as a BLOB or Binary Large Object. So the data is handled, but not natively.

For example, Redwood Shores, Calif.-based Oracle's database lets "you extract info out of the database and publish it as XML but doesn't store it that way—it stores it in a PL/SQL package. For my money, that's fine for most applications. Or you make it into a BLOB," said one large Oracle partner.

Sandeepan Banerjee, Oracle's director of product management for XML, says his company offers both a "hybridized relational model" modified to include XML constructs as well as shredding. A lot of the discussion around "who's more native" is beside the point, Banerjee said. "The proof is in what you can do with the database."

Customers using Oracle's XML capability have achieved 2,500 database transactions per second in production mode, he maintained.

But others say those techniques do not cut it.

"Shredding is really a hack," Chan said. "Just a way to get data in and out. It's not the same as doing it natively. Viper will be much faster, it will store documents in native format vs. putting them all into some sort of BLOB."

With a native XML data store like in Viper, the representation of the data as it flows from client to disk and back again remains in XML format—meaning that both the logical and physical data models are XML. This obviates the need for shredding.

A limited-beta version of Viper is in the hands of about 30 customers and partners, said Bob Picciano, vice president of database servers at IBM. Full beta is slated for the second half of the year and general availability sometime in 2006, IBM said.

The key is to offer fast, native support for XML, "the lingua franca" for unstructured data, said Perna. She acknowledges that a lot of that data is not originally in XML format, but at least can be mapped into it. "That is why we're enhancing DB2 with native XML support," she said.

Most estimate that 80 percent to 85 percent of corporate data is unstructured—think of all those Word or WordPerfect documents, e-forms, PDF files and content management systems out there. Of course, the remaining 15 percent to 25 percent of the information residing in databases or other repositories is also critical. A database that could truly marry the two worlds so that they could be managed and accessed uniformly is, to many, a database nirvana—one that would enable database administrators to continue to use their skills, and those with XML expertise to do likewise.

All of this is a linchpin of IBM Chairman Sam Palmisano's on-demand worldview. What is more important to have at your immediate disposal than relevant data served up accurately regardless of where it originated or what format it is in?

Some integration partners think IBM has a real shot with this project. John Parkinson, CTO of the Americas at Capgemini, New York, likes what he's seen of Viper and agrees that there is potentially a big market for a database that does what Viper claims. "There is increasing demand for such things because while the amount of SQL data goes up linearly, the amount of non-SQL data grows exponentially. Being able to query both types of data the same way is a huge deal," he said.

While Viper is still in early test phase, much of the groundwork has been done in advance, said IBM's Picciano. He cites the addition of the Starburst optimizer to DB2 back in its version 5 release. (The current version is 8.2.) Starburst "lets us add extensibility in the form of abstract data types and other user-defined functions called extenders." He likened that work to what Informix, which has since been acquired by IBM, had done with its blades technology.

These extenders enabled IBM to implement a new "canonical format" to represent any query or language that interacts with the database's Query Graph Model.

In addition, IBM has adapted its data manager, the software that lays out information on the disk. "In relational format, they're constructed in a certain way to mimic the way people interact with data—either OLTP [online transaction processing] or analysis data. We looked at those pages and decided we needed a new model so people can use multiple schemas to interact with the database," Picciano said.

He says all of this work in the guts of the system will mean less work for ISVs, VARs and others who work with the database itself. "This is not like going from 32- to 64-bit apps. You won't need to rewrite. There will be a seamless path, you can just start adding XML," he said.

There was other foundation rework done in the rest of IBM's portfolio, Perna said. "It's throughout IBM. You start at the metal, at the storage devices with work around virtualization of SANs. Up from that layer you get technologies like backup and recovery and archiving and replication through Tivoli storage management then up a layer to information management, the database, content management and information integration work we've done. Then you make all that available through technologies like Workplace solutions for access at any time."

That sounds like a lot of buy-in to one vendor's offerings. But Perna also reiterates IBM's pledge that federated systems will ease adoption and avoid vendor lock-in. "You can have an application written to SQL Server and want to join some customer records in a portal app—the data about your customer is in Oracle and Microsoft. So, you write an app in standard SQL and can join that information with another data source—maybe some XML documents. Without changing the application, you work through Information Integrator to join the XML documents with Oracle SQL data," she said.

Picciano maintains current apps will run great hitting against this database, but most importantly this "breakthrough" that is Viper will knock down the barriers that have been built up between silos of information residing in different apps and repositories. "This will let you do information in context," Picciano said. "XML preserves a great deal of meta-data about the information itself and about its usage and relevance to other applications and data. This will bridge those silos and provide better insight for doing business process optimization across applications."

For example, a large enterprise could take human resources data, much of it nonrelational and stored in formlike templates, and get it to work well with OLTP and warehouse-type apps that have typically tapped relational data. "You'll be able to intersperse XML data representing a variety of things, put that right into the database with the understanding of how and where it can be used. This is a very, very seminal technology, taking on the big problem of semantic integration," he said.

Perna says that Viper will strengthen IBM's whole information management portfolio. "DB2 is the foundation of much of what we're doing in information integration, it is the foundation of our content management offerings and will play a key role as we bring together relational and nonrelational data."

IBM execs also maintain—and this is a big assertion—that performance will not be victimized in the process. Typically, multifaceted products tend to sacrifice performance as they get more broad-based.

On the other hand, they assert that even a truly fast database supporting both relational and nonrelational data will not necessarily displace the need for RDBMS per se. "We'd love to have DB2 be the standard for storing those XML documents, however, I have been forthright in saying I don't believe all information will be in one place. That's Oracle's shtick. Data will be where it is," Perna said.

Redmonk analyst Stephen O'Grady agrees. "Do I see that as a major threat to all RDBMSes? No. But are there people who may want to transition their stores to a more XML-based data model? Perhaps. XML technology, whether it's from IBM or otherwise, is more about greenfield scenarios. There are so many massive packaged applications out there that there will still be tremendous demand for relational service," he noted.

Some are unconvinced that this really will be the holy grail.

George Brown, president of Database Solutions, a Cherry Hill, N.J., solution provider, is unimpressed with all the structured/unstructured rhetoric. "I really don't think this is that big a deal. All of the databases have some XML support now—enough to meet most needs." Even if IBM comes up with something truly radical, he is unconvinced of the burning need. "The more technology you spread around, the thinner it gets."

Phil Mogavero, president of Database Systems Worldwide, Woodland Hills, Calif., said IBM has to get its sales and partner strategy in order before he'd take on even the best and brightest new product. While IBM Software has made inroads recruiting ISVs and other partners, Mogavero thus far is unmoved by its pledges of channel faith. "We don't sell DB2 today because IBM is so schizophrenic. They want to be a partner, then a competitor, then an outsourcer, then an insourcer. At least with Oracle you know where you stand. They're a direct-sales company with a few people trying to make it a channel company. Microsoft is a channel company that wants to work through volume channels that's trying to get better in more consultative sales."

Still, many see this as an evolution of the overall category. No one expects IBM will be the only database vendor to come up with a hybrid. "It's so competitive in the database field, the dominos will fall pretty fast, things will equalize fairly quickly," said PureEdge's Chan.


IBM's Current and Future Info Management Family

IBM's Current and Future Info Management Family

> Hybrid database (Viper) will handle structured SQL and unstructured XML data natively; queryable by SQL or Xquery. Now in alpha; beta slated for second-half 2005, general availability 2006.

> DB2 8.2, the current relational database, available in several configurations.

> DB2 version 8.2.2 with 'SAP tuner' to autoconfigure database for optimal use in SAP environments. Available now.

> Cloudscape, the Java-based embeddable database acquired along with Informix. Available commercially from IBM. Open-source version (Derby) is maintained by the Apache Software Foundation.

> WebSphere Information Integrator's current release (Masala) has been available since fall 2004. Allows federated search of structured and unstructured data across repositories.

> Next release of Information Integrator software, (Serrano) in alpha now; to ship by year's end. Adds relationship tracking between data residing in different applications and repositories and facilitates 'actionable' search.

> WebSphere Data Integration Suite (Hawk) will be available by year-end. Builds on Ascential's data cleansing and integration know-how.

> IBM DB2 Content Manager 8.3 features automated records control, and capture and management of XML documents. Available now.

> DB2 Data Warehouse Edition 8.2.1 integrates Alphablox analytics into its data mining; online analytical processing (OLAP); and extract, transform and load (ETL) capabilities. Available now.

About the Author(s)

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights