MapReduce is supported by and often associated with Hadoop, a fast-growing open-source project that is popular among Internet giants, but there's a comparatively tiny (and high-cost) pool of experts capable of deploying and managing Hadoop environments.
The key benefit of Aster's SQL-MapReduce framework is that it makes MapReduce accessible to SQL-literate data professionals within Aster's SQL-based database. Thus, the platform supports pattern-detection, graph analysis, and time-series analysis on data such as clickstreams--the sorts of analyses employed to uncover Web purchase patterns or to determine the effectiveness online and email marketing campaigns.
The Aster Data 5.0 upgrades include pre-built MapReduce modules for behavioral clickstream interpretation (why are people following certain navigation paths?), marketing attribution (which email campaigns and banners are driving purchases?), decision-tree analysis (what choices are customers making?), and other analyses. A workload management framework has also been improved to handle memory allocation of SQL and MapReduce processes.
The Aster MapReduce Appliance set for release next year will take advantage of Teradata's hardware expertise and buying power. It will put the Aster database on the hardware used for the Teradata Data Warehouse Appliance, but no details were available on cost or capacities.
Teradata is wisely retaining Aster's current offering of cloud-based or stand-alone database software, giving customers the choice of how they wish to deploy Aster Data. The primary competition to Aster is the combination of an incumbent SQL-based data warehouse and a new Hadoop deployment.
EMC bet on Hadoop last May when it introduced community and commercial Hadoop software distributions. This week the vendor added the EMC Greenplum Modular Data Computing Appliance, which is capable of hosting Greenplum (SQL) database deployments and Hadoop deployments on a single box.
Adding to its column-store announcement, Teradata enumerated other Teradata 14 upgrades aimed making data warehousing "far simpler," with automated capabilities aimed at workload management and partitioning, compression decisions, and temporal (time-based) analyses. The new workload management features are designed to give administrators fine-grained control over service levels down to CPU and data input/output (I/O) usage levels.
Teradata 14 supports virtual partitions that will enable administrators to assign service levels within service levels, giving, say, 60% of capacity to a division in Germany and 40% to a unit in the U.K., and then certain CPU and I/O levels to specific departments, functions, or queries within those offices.
Teradata already had the ability to move "hot" frequently accessed data to cache or fast disks and "cold" infrequently-accessed data to slower disks. The database upgrade adds a Compress on Cold feature that automatically applies appropriate levels of compression based on the same hot/cold analysis. Little-used data will be compressed at up to a 5-to-1 ratio to maximize available storage space.
New temporal capabilities will help global organizations recognize variations in business calendars from country to country. If a work week technically begins on Sunday in one country and Monday in another, this feature lets companies time-specify the data they analyze from those countries when they're trying to analyze weekly payroll or inventory, for example.
With these latest moves, Teradata continues to keep some distance between itself and the rest of the data warehousing pack in terms of advanced features and the size and influence of its customer base. The Aster Data acquisition has brought the company into an emerging market where it had yet to make significant inroads.
EMC and IBM Netezza continue to be Teradata's closest competitors, and EMC in particular seems intent on matching or besting it in advanced areas such as in-database analytics and multi-structured data analysis.
As for Oracle, this week's release of the Oracle Database Appliance underscores that vendor's focus on mainstream database uses. It has yet to show interest in anything other than structured data or to show off anything close to a 100-terabyte league (let alone petabyte-league) customer deployment of Exadata. That leaves lots of room for Teradata and others at the top of the market.
At the 2011 InformationWeek 500 Virtual Conference, C-level executives from leading global companies will gather to discuss how their organizations are turbo-charging business execution and growth. This virtual event happens Oct. 6. Find out more.