Software // Information Management
Commentary
1/11/2011
06:32 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

5 Paths To The New Data Integration

Embedded, automatic and easy new approaches meet growing demands for do-it-yourself data analysis.

Attivio: Universal and Unified

Enterprise search and BI have each been around for decades, largely operating in information silos, one restricted to documents and the other to data collected from operational and transactional systems. Attivio's aim, dating to the company's 2007 founding by refugees of FAST (a Microsoft subsidiary since 2008), has been to break down the database-document barrier by providing a search interface that relies on a single, unified index. Attivio delivers results in familiar BI dashboards and analysis widgets.

Attivio pulls data from a very wide variety of disparate sources, from files and databases and also e-mail, content-management, and enterprise-application systems via APIs and connectors (supplied by the company and partners).

The Attivio Active Intelligence Engine (AIE) will extract content (text, metadata, structure information), manipulate it, enrich it, and link or join it. "Enrichment components such as sentiment and entity extraction and classification can be used to add intelligence to the integration process," says company co-founder and CTO Sid Probstein. "They require some setup work, mostly training on the customer's data."

Attivio performs "dynamic schema creation" based on discovered data values and types, and "we have a number of components that identify and report on integration opportunities after a small data set is processed," Probstein says.

Techniques include:

  • Detecting, by name or by content, that two columns, from same or different sources, appear to be the same.

  • Detecting that two tables, from same or different sources, appear to be joinable based on some common key values.

  • Detecting anomalous values within a single column or table.

  • Detecting type differences between columns that have similar names.

  • Detecting duplicate or near-duplicate records based on a variety of keys."

Attivio AIE's dynamic schemas support ad-hoc integration of diverse data, but it is by no means the only credible search-BI technology on the market. Endeca's Information Access Platform (IAP) uses similar techniques to provide similar capabilities, targeting online and mobile commerce and publishing in addition to search-BI. Other, specialized platforms adapt these integration techniques to focused business problems and information domains.

FirstRain Senses Time

FirstRain is a business-information search and monitoring tool that mines and integrates information from the open Web -- news, blogs, and industry, government, scientific, and academic sources -- in addition to a set of key corporate-information databases. The aim, per the company's Web site, is to "derive relationships, spot changes in management or business structure, and track trends across industries."

"The application of semantic analysis that is 'business structure aware' is crucial to be able to identify and deliver relevant business information that is scattered throughout [disparate] sources," says the company's technology vice president, Marty Betz. Also crucial is the ability to synthesize time sequence from pages found on the open Web.

(Time sequence is important! Indeed, the number-three result returned by a Google search for "us senator pennsylvania" was now-former Senator Arlen Specter's now-disappeared Senate Web page.)

"By analyzing the flow of content through our pipeline, the system can dynamically model and adjust its understanding of the market ecosystems around companies and industries," Betz says.

Betz describes the use of trending and anomaly detection, applied to unstructured narrative content from a variety of sources, to enable a different class of questions to be systematically asked, analyzed and answered via answers that require "connecting the dots."

So in FirstRain we have broad-but-selective content acquisition and integration, with the application of goal-relevant organizing principles, to respond to a high-value business need: timely access to corporate developments.

Previous
2 of 3
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A UBM Tech Radio episode on the changing economics of Flash storage used in data tiering -- sponsored by Dell.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.