5 Paths To The New Data Integration - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
Commentary
1/11/2011
06:32 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

5 Paths To The New Data Integration

Embedded, automatic and easy new approaches meet growing demands for do-it-yourself data analysis.

Attivio: Universal and Unified

Enterprise search and BI have each been around for decades, largely operating in information silos, one restricted to documents and the other to data collected from operational and transactional systems. Attivio's aim, dating to the company's 2007 founding by refugees of FAST (a Microsoft subsidiary since 2008), has been to break down the database-document barrier by providing a search interface that relies on a single, unified index. Attivio delivers results in familiar BI dashboards and analysis widgets.

Attivio pulls data from a very wide variety of disparate sources, from files and databases and also e-mail, content-management, and enterprise-application systems via APIs and connectors (supplied by the company and partners).

The Attivio Active Intelligence Engine (AIE) will extract content (text, metadata, structure information), manipulate it, enrich it, and link or join it. "Enrichment components such as sentiment and entity extraction and classification can be used to add intelligence to the integration process," says company co-founder and CTO Sid Probstein. "They require some setup work, mostly training on the customer's data."

Attivio performs "dynamic schema creation" based on discovered data values and types, and "we have a number of components that identify and report on integration opportunities after a small data set is processed," Probstein says.

Techniques include:

  • Detecting, by name or by content, that two columns, from same or different sources, appear to be the same.

  • Detecting that two tables, from same or different sources, appear to be joinable based on some common key values.

  • Detecting anomalous values within a single column or table.

  • Detecting type differences between columns that have similar names.

  • Detecting duplicate or near-duplicate records based on a variety of keys."

Attivio AIE's dynamic schemas support ad-hoc integration of diverse data, but it is by no means the only credible search-BI technology on the market. Endeca's Information Access Platform (IAP) uses similar techniques to provide similar capabilities, targeting online and mobile commerce and publishing in addition to search-BI. Other, specialized platforms adapt these integration techniques to focused business problems and information domains.

FirstRain Senses Time

FirstRain is a business-information search and monitoring tool that mines and integrates information from the open Web -- news, blogs, and industry, government, scientific, and academic sources -- in addition to a set of key corporate-information databases. The aim, per the company's Web site, is to "derive relationships, spot changes in management or business structure, and track trends across industries."

"The application of semantic analysis that is 'business structure aware' is crucial to be able to identify and deliver relevant business information that is scattered throughout [disparate] sources," says the company's technology vice president, Marty Betz. Also crucial is the ability to synthesize time sequence from pages found on the open Web.

(Time sequence is important! Indeed, the number-three result returned by a Google search for "us senator pennsylvania" was now-former Senator Arlen Specter's now-disappeared Senate Web page.)

"By analyzing the flow of content through our pipeline, the system can dynamically model and adjust its understanding of the market ecosystems around companies and industries," Betz says.

Betz describes the use of trending and anomaly detection, applied to unstructured narrative content from a variety of sources, to enable a different class of questions to be systematically asked, analyzed and answered via answers that require "connecting the dots."

So in FirstRain we have broad-but-selective content acquisition and integration, with the application of goal-relevant organizing principles, to respond to a high-value business need: timely access to corporate developments.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
2 of 3
Next
Comment  | 
Print  | 
More Insights
Slideshows
IT Careers: 10 Industries with Job Openings Right Now
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/27/2020
Commentary
How 5G Rollout May Benefit Businesses More than Consumers
Joao-Pierre S. Ruth, Senior Writer,  5/21/2020
News
IT Leadership in Education: Getting Online School Right
Jessica Davis, Senior Editor, Enterprise Apps,  5/20/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Slideshows
Flash Poll