Content Pipeline

When you need to connect disparate content stores, you can use content integration software, federated search, portals or a new option called enterprise information integration. Most firms should use a combinartion of approaches.
Although the firm had the tools and human resources to handle some categorization of information, users lacked the ability to perform open-ended searches across all content stores. In September 2003, Cleary, Gottlieb chose MindServer federated search software from Recommind to crawl all of these sources.

"Because of the dominance of Lexis/Nexis and Westlaw, every lawyer has been taught how to use search tools," Miller says. "Having strong search capabilities across collections is a natural and necessary tool that lawyers expect."

The MindServer project is still in its pilot stage, but Miller expects the federated search to be rolled out to all 2,000 users by the end of the year. Miller declined to divulge the firm's expenditures, but says the search capabilities were required.

"You have to give users tools to access information themselves so they're not dependent on others," Miller says. "It's not that there will be a tangible ROI that we can document, it's a necessity to make this data accessible."

Portals Open Doorways into Content

Portal software provides doorways (or windows depending on which metaphor you prefer) to multiple applications. "Portlets" or "gadgets" for different applications and sources deliver portions of data, content or code to the user's desktop. The search function in most portals provides read-only, one-way access to content. While the content isn't integrated, it does coexist in a visual presentation. In some applications, such as executive dashboards of company performance, this on-screen integration is all users need.

Portal technology is an alternative to content integration when you need to access and aggregate information, yet Lachal of Ovum says the two approaches can also be complementary.

"Portal technology can rely on content integration software, enterprise application integration and EII to more easily access back-end systems," he says.

Portals often provide a richer user experience than content management software alone, yet individual portlets are limited in that they only display information from a particular back-end repository. What portlets need to become true content integration tools is a layer of abstraction so you don't have to write one portlet for each back-end management system you want to expose through the portal. With abstraction, you could create a universal integration portlet.

For now, portal vendors including IBM, Plumtree, Tibco, BEA and others offer plentiful integrations with content management systems. And BEA's WebLogic Portal has a Virtual Content Repository that works with software from FileNet, Documentum and FatWire.

Westinghouse deployed a Plumtree portal four years ago in part to give customers access to documents over its extranet, eliminating mailing of CDs, e-mailing and file transfers. More recently the company has begun turning the portal inward to provide 700 employees working all over the world with access to diverse content. Thus far, about 170,000 documents in Windows NT and Unix network shared files, SQL Server databases and other databases have been exposed through the portal.

"We have huge volumes of information on network file shares that are difficult to access — you have to have permissions, you have to know where everything lives, you have to have a map to them," says software engineer Darlene Daverio, adding that the portal provides faster, easier access.

Westinghouse stores more than a million documents in EMC's Documentum software, the corporate standard for certain classes of documents. The portal normally provides access to those documents and others stored in Lotus Notes, but those connections recently broke when the company upgraded to the latest version of the Plumtree portal. Westinghouse is waiting for new portlets to get these repositories plugged back into the portal. This hiccup aside, the portal has aggregated huge volumes of disparate content in one place.

"The end user doesn't have to know where [content] lives and how to navigate the GUI of each of those back-end systems," Daverio says. "Now we're able to use a standard interface for all different systems and search through all of them from one place. I can do a keyword search against NT file shares, Documentum and Web sites all at the same time. It would be [much more] labor intensive to search those independently."

EII Unites Structured and Unstructured Data

Sometimes your need to access information goes beyond unstructured content; you need to pull information from databases, too. For example, in a mortgage process you may need to access documents such as the loan application and home appraisal report as well as database information such as credit bureau data, current rates and customer account status. In such cases EII software may come into play. EII isn't exactly new; rather, it's the latest incarnation of database integration software formerly known as "heterogeneous distributed database," "virtual centralized database," "federated database," "data integration system" or "enterprise data access."

Ovum defines EII as "a packaged data federation middleware solution that provides access to, and a unified view of, multiple types of data sources."

Just as content integration software forms a middleware layer for accessing different content repositories, EII provides a middleware layer for accessing data from different databases. Some EII vendors are moving toward extending access to unstructured information. Although none have quite succeeded in spanning both data types, some vendors have added free-text indexing that lets users search unstructured data by keywords or concepts.

"Historically, companies have focused on the data side, [yet] content is critical to many business processes," says Moore of Forrester, adding that the next level of progression is to integrate both of those silos using the same tool.

Providers of EII that are embracing content include Actuate, Journee, SAP and, with its Venetica purchase, IBM. EMC is also building an EII platform as part of its Information Lifecycle Management framework.

"Our customers were saying 'it's not just about content, it's about information," says Stuart Levinson, president of Venetica. "Customers of IBM's [heretofore database-oriented] DB2 Information Integrator were saying, 'it's not just about structured information.'"

By 2007, a new "organic information abstraction" (OIA) layer will emerge to connect separate environments of data, content and text, according to a recent Forrester report by Laura Orlov. "OIA will provide a set of services and metadata that harness insight from these assets — without complex and brittle customization."

Firms will need a common view across structured and unstructured information to drive new customer service, compliance, and sales and marketing applications, Forrester contends, and it advocates building up expertise in areas such as taxonomy and metadata development that cross both domains.

Bringing Together the Pipelines

While each of these approaches provides a piece of the content integration puzzle, no one option will address all of your information access challenges. Large organizations will need several of these technologies in combination to achieve a free flow of information throughout the enterprise.

"All these paths deliver parts of the answer," says Lachal of Ovum. "The hard work is not so much understanding these technologies and pitting them against each other, but realizing the extent to which mixing a little this and a little that will enable you to fit your needs and budget constraints."

Although each approach might work on its own if your needs are very specific, in the grand scheme of structured and unstructured data, you'll need to understand how all these technologies work together to find the right point of convergence.

Want a content integration product chart? Get one here.

[This same chart is also available as a PDF: Download Here]

Web Links

The Content Integration Imperative, by Forrester analyst Connie Moore, downloads/ The_Content_ Integration_Imperative.pdf

IBM Defends ECM Leadership, Ovum Comments, content/c,49666

Doculabs' Analysis of IBM's Purchase of Venetica Corporation, research/ lspeaks_ibm-venetica.htm

Read more:
Look to Standards for Content Integration - At least three standards initiatives underway are aimed at meeting interoperability and reuse challenges.