If the term "enterprise information integration" isn't immediately clear, there's a reason: EII is a broad notion that raises more questions than it answers. How do you know if EII is right for your organization? What are the challenges of implementing EII? Above all, what does EII offer that isn't already covered by data warehousing and data extract, transform and load (ETL) software procedures? How is it different from customer data integration and other recent approaches to information integration?
All the acronyms are enough to make your head spin. In this article, we'll clarify EII and its role, particularly in business intelligence and data warehousing scenarios.
Break with Tradition
• Review your data integration needs. You want to add a stock price and other company information on your portal, and this information is widely distributed. Or, your key supplier has started providing new information in data files not currently handled by your ETL process, and you are loathe to open this Pandora's box. Rather than have three new applications under development incorporating spaghetti code routines, consider a virtual, integrated data store that all applications could use. The single, virtual view benefits multiple reporting and data analysis needs, rather than just one need.
• Understand the EII tool's limitations. James Markarian, chief technology officer at Informatica Corp., suggests that scalability of EII solutions may be driven by determining factors such as query performance and caching. Ensure that your scalability requirements are addressed by the EII solution.
• Explore the full range of vendors. Avaki, Composite Software, Ipedo and MetaMatrix are prominent EII tool providers. However, major software vendors such as BEA, IBM, Oracle and SAP are filling out their portfolios with some form of EII technology, perhaps as CDI, RDM or MDM. Organizations with predominantly IBM technology, for example, should take a good look at DB2 Information Integrator, while Oracle shops should consider Customer Data Hub. Ascential Software and Informatica, the two heavyweights of traditional ETL integration, don't field EII tools, but this may change in 2005.
• Begin with a prototype — and stick with it. EII has come a long way from the early attempts at query federation, which has been an "emerging technology" in database research and development circles for some time. Still, as new technology, it will require some fortitude to implement. Vendors will need your faithful partnership to realize the potential. Identify the types of information you'd like to integrate and create a prototype that not only demonstrates integration but also allows you to do performance measurement and diagnostics.
EII differs most from conventional ETL-oriented data warehousing in that it accesses, rather than moves, information. Keep in mind that ETL really isn't one standard procedure but multiple processes that vary according to what an organization needs. However, ETL generally involves data movement to a central repository or other files and subsystems, such as data marts, that support BI reporting. EII uses virtualization to present clients with a view of one consolidated information resource, hiding the federated query system that's actually drawing from multiple data resources. EII "plays the data where it lays," as some put it.
The number and complexity of data silos — disparate, disconnected resources beholden to a single department or user — continue to grow, outpacing IT's attempts to standardize ETL and data integration tools as well as efforts to update and maintain what still dominates most integration efforts: custom code. Regulatory compliance, real-time BI and new challenges involving convergence of structured and unstructured information are putting even more pressure on conventional approaches.
EII could be a solution to some of these woes. Along with less movement, EII involves less extensive data transformation, focusing on combining diverse definitions of data elements and presenting the result as a single information element. Strong global query optimization is critical to EII and will always be a challenge; however, optimization plays to the strengths of established database management vendors as well as newer vendors applying the latest algorithms. Automated intelligence in query optimization, as well as full support for universal data access standards (such as ODBC, JDBC and XML), can take the burden of knowing the intricacies of each data source off the application or user.
Performance is removed from the programmer/administrator domain and is given over to the EII tool. Most EII tools employ data caching or staging to improve query performance. While EII is most often used just to read data, the federated approach could work for bidirectional transactions-that is, updating and manipulating across multiple sources. Emerging service-oriented architectures (SOAs) and enterprise service bus (ESB) technology will work with EII to let clients consume data and expose information as part of Web services.
The Metadata Layer
A key aspect of EII — and one supported by most of the tools in the marketplace — is robust data modeling and metadata management. While the dream of a single data or information model within an enterprise remains elusive and in most cases impractical, EII helps establish an integration architecture that melds modern universal access standards with data about the sources, and about the information requirements, such as a single view of a product or customer. In other words, the focus is on how data is used rather than on generic relational data modeling for ETL.
Metadata is important for another goal: reusability. EII can help business analysts and developers maintain virtual views, including logic and interfaces required to create and maintain such perspectives on customers, products and other objects of interest. EII tools work with the metadata layer to ensure security of the metadata and interact with security at disparate data sources.
To sum up, EII is a data integration and virtualization technique aimed at providing a unified view of data — a single version of the truth. It does so by facilitating access to multiple or disparate data sources on demand in a secure, efficient manner.
What EII is Not
To understand EII, it's important to consider what it is not. I've discussed some of the key ways in which EII is different from ETL, a large-volume, batch-oriented approach to data movement and transformation. EII, on the other hand, is mainly about retrieving data on demand.
Does EII compete with data warehousing? Despite the buzz, vendors unanimously answer "no." EII supplements, rather than supplants, data warehousing. EII can help data warehousing by bringing in data from minor or nonstandard sources. It can also federate data from the warehouse itself, join it with data from other sources, and present it to the user or client application on demand. BI, the main consumer of data warehousing, is also an EII consumer.