Data integration will be a top story in information technology in 2011.
Whether your interests are in business intelligence, information access, or operations, there are clear and compelling benefits in linking enterprise data -- customer profiles and transactions, product and competitive information, weblogs, -- to business-relevant content drawn from the ever-growing social/online information flood.
ETL (extraction, transform, load) to data stores, together with the younger, load-first variant ELT, will remain the leading integration approaches. But they'll be complemented by new, dynamic capabilities provided by mash-ups and by semantic integration, driven by data profiles (type, distribution, and attributes of values) rather than by rigid, application-specific data definitions.
These newer, beyond-ETL approaches constitute a New Data Integration. The approaches were developed to provide easy-to-use, application-embedded, end-user-focused integration capabilities.
The New Data Integration responds to the volume and diversity of data sources and needs and to growing demand for do-it-yourself data analysis. I explored these ideas last year in an article on 'NoETL'. In this follow-up I consider five examples, with capsule reviews of same-but-different approaches at Tableau, Attivio, FirstRain, Google, and Extractiv. Each example illustrates paths to the new data integration.
Tableau: Easy Exploration
No BI vendor better embodies the DIY spirit than Tableau Software. The company's visual, exploratory data analysis software lets end users delve into structured data sources and share and publish analyses. By "structured data sources," I mean anything ranging from Excel spreadsheets to very large databases managed with high-end data-warehousing systems. Tableau's power and ease of use has won the company an enthusiastic following.
Tableau's Data Blending capability, new in November's Tableau 6.0 release, caught my attention. The software will not only suggest joins for data fields across sources, by name and characteristics; according to Dan Jewett, Tableau VP of Product Management, it will also aggregate values, for instance rolling up months to quarters, to facilitate fusing like data stored at different aggregation levels.
The software also supports "alias values" for use in blending relationships. For instance, it can match state names to abbreviations, part numbers to part names, and coded values such as 0 and 1 for "male" and "female."
Usage scenarios include comparing budget and sales projections to actuals, where users may compare spreadsheet-held values to corporate records. The software also supports blending of external-source information into corporate data.
"Marketing organizations often get data feeds from suppliers and partners they want to join in with the in-house CRM system data," Jewett explains. "These are often ad-hoc feeds, so structured processes that IT likes to build don't support this case."
Tableau can pull data from Web sources via application programming interfaces (APIs) adhering to the Open Data Protocol (OData) standard. This capability will help users keep up with the growing volume of online data.
Tableau, like the vast majority of BI applications, does work exclusively with "structured" data. That focus must and will change as users confront an imperative to tap online and social sources, via search- and text-analytics enhanced BI.