In-Database Analytics: A Passing Lane for Complex Analysis - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
News
12/15/2008
12:06 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

In-Database Analytics: A Passing Lane for Complex Analysis

What once took one company three to four weeks now takes four to eight hours thanks to in-database computation. Here's what Netezza, Teradata, Greenplum and Aster Data Systems are doing to make it happen.

Upstarts Join the Race

Greenplum and Aster Data Systems are newer but fast-growing entrants in the MPP data-warehousing market. Both claim petabyte-scale capabilities, and both implement a MapReduce framework — notably employed by Google to enable indexing and search scalability, for parallelization. (MapReduce, in the form of the open-source Apache Hadoop project, is the basis for the Hbase open-source, distributed, column-store database system.)

Greenplum executives claim that any developer can program in-database analytics for the Greenplum platform, and they estimate that 20 percent of customers are using Greenplum-embedded analytics. Greenplum CTO Luke Lonergan describes implementing advanced analytics tools including the routines from the BLAS and LINPACK linear algebra libraries and MLinReg multilinear regressions from Statpak as well as an initiative to embed the open-source R statistical programming language in Greenplum. (In fact, the open-source PostgreSQL database system, on which Greenplum is based, has procedural language bindings for R, Python, Perl, and other programming languages that Greenplum can exploit.)

Greenplum claims to have more than 50 customers, including MySpace, eBay, the New York Stock Exchange, and Sun Microsystems. "Our customers are building their own enterprise-analytics applications," says Greenplum Marketing Vice President, Paul Salazar, "and we're trying to make it easy for them." In addition to including the MapReduce and "programmable parallel analytics" implementations, Greenplum 3.2, released in September, gained in-database compression and enhanced database-monitoring capabilities.

Aster Data Systems touts the capability of its Aster nCluster 3.0 analytical database, released in October, to support "frontline" functions including credit scoring, behavioral ad-targeting, fraud detection, spam denial, recommendations, and risk modeling. CEO Mayank Bawa says the company has parallelized commonly used algorithms that provide for sequential pattern analysis and other transformations on live data. "That means you can do modeling inside the DB without exporting to SAS or other software," Bawa says. The executive points to linear regression, time-series modeling, and k-means clustering as examples of Aster-provided algorithms.

Bawa says Aster's nCluster implementation of a MapReduce parallelization framework provides a "procedural programming paradigm" that supports SQL-invoked execution of off-the-shelf and user-programmed functions. "We have taken enormous care to make the process easy," says Bawa, adding that in-database code encapsulation ensures DBMS stability.

Netezza may have been the first to ship parallel, database-embedded analytics, but rivals were quick to launch competing technologies. More solutions and expanded partnerships are prominently on the roadmap for Aster, Greenplum, Netezza, and Teradata. The advances will help customers tap data warehouses for front-line, operational analytics, and they will help keep pressure on data-warehousing vendors including Oracle and Microsoft that do not yet offer database-system parallelization.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
3 of 3
Next
Comment  | 
Print  | 
More Insights
Slideshows
10 Top Cloud Computing Startups
Cynthia Harvey, Freelance Journalist, InformationWeek,  8/3/2020
Commentary
How Enterprises Can Adopt Video Game Cloud Strategy
Joao-Pierre S. Ruth, Senior Writer,  7/28/2020
Commentary
Conversational AI Comes of Age
Guest Commentary, Guest Commentary,  8/7/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Special Report: Why Performance Testing is Crucial Today
This special report will help enterprises determine what they should expect from performance testing solutions and how to put them to work most efficiently. Get it today!
Slideshows
Flash Poll