September 22, 2011
10 Lessons Learned By Big Data Pioneers
10 Lessons Learned By Big Data Pioneers (click image for larger view and for slideshow)
Data warehousing specialist Teradata announced Thursday a battery of upgrades that promise to blur boundaries between database architectures and applications.
The Teradata 14 database, due in December, will bring column-store analysis and compression capabilities to a row-store database. Aster Data 5.0, a release set for early next year, will advance that database's ability to handle a mix of structured, semi-structured, and unstructured information, and Teradata also is adding an Aster appliance built on its hardware.
Beyond these headlines, Teradata also piled on database management and automation upgrades aimed at minimizing the administrative and maintenance requirements of both databases.
Bridging the boundaries between row-store and column-store databases is a big deal, and it's something several vendors have been working on. Teradata has always been in the row-store camp along with EMC Greenplum, IBM DB2 and Netezza, Oracle Database, Microsoft SQL Server, and others. Sybase IQ was first a commercially successful column-store database, and products including HP Vertica, Infobright, and ParAccel have delivered variations on the column-store architecture.
[ Want more on data warehousing? Read Two Ways to Tackle Really Big Data. ]
Column-store databases have an advantage when you only need to query selected columnar attributes of data, like all the zip codes, product SKU numbers, and transactions dates in the database. That could tell you what sold where within the last month without wading through all the other data that might appear row by row, like the customer name, address, account number, and so on. Less data queried means faster results.
Column-store databases also do a great job at compression because the data in columns is consistent--all zip codes, all dates, all product SKU numbers and so on. That helps column stores achieve upwards 30-to-1 or 40-to-1 compression, depending on the data, while row-store databases, including Teradata's, max out at about 4-to-1 compression.
Teradata says it's 14 release will enable new and existing customers to mix-and-match columnar and row-based physical storage when it best suits an application. When contact center agents at telcos or branch managers at a bank try to answer questions for customers, for example, their queries usually involve only a few attributes of a total customer record. That's just one example of when a column-store approach might yield significantly faster results.
Oracle introduced a Hybrid Columnar Compression feature with Exadata in 2008 that squeezes data to a claimed ratio of 10-to-1. EMC Greenplum introduced a blending of row-store and column-store approaches with its polymorphic data storage approach in 2009. And Aster Data, which was acquired by Teradata in March for $263 million in cash, introduced hybrid row/column-store approach in 2010.
Oracle's hybrid feature does not support selective, columnar querying, so it doesn't speed querying significantly like a true column-store database. Aster does do selective querying, but it does not offer columnar compression, according to independent database analyst Curt Monash. As for EMC Greenplum and Teradata, "each offers different ways to mix column and row storage in the same table with each approach offering advantages," Monash said.
The biggest challenge for Teradata customers may be figuring out when to use a row-based versus column-based approach. "You'd be looking at data-access paths and data demographics to choose between row-store and column-store objects," said Scott Gnau, president of Teradata Labs, in an interview with InformationWeek. "But this is supportive of an enterprise data warehouse approach because it eliminates the temptation to extract certain sets of information and put it on a separate, column-based platform."
A key feature of the new columnar capability is automatic compression that chooses the best compression algorithms for each column of data and that dynamically changes the compression approach as data-access patterns change.
Gnua declined to offer data-compression claims, saying rates would vary depending on the data.
He did, however, predict that Teradata's columnar approach will outperform Oracle's Hybrid Columnar Compression.
Tackling Big Data
Teradata's announcement of the Aster Data 5.0 database and an Aster MapReduce Appliance, both planned for early next year, is a bit of a coming out party for Aster as a unit of the larger company. The database upgrades are incremental and the appliance is no surprise, but Teradata has an opportunity to put the Aster story on a bigger stage.
Teradata bought Aster to take advantage of the smaller company's innovation in blending analysis of structured data, semi-structured data, largely unstructured information, or a mix of all of the above. It does so with its SQL-MapReduce framework, which lets companies perform MapReduce processing on its SQL-based platform.
Top 15 Data Visualization Tips
(click image for larger view)
Slideshow: Top 15 Data Visualization Tips
MapReduce processing is in big demand because it's useful in crunching massive quantities of Internet clickstream data, sensor data, and social-media content.
MapReduce is supported by and often associated with Hadoop, a fast-growing open-source project that is popular among Internet giants, but there's a comparatively tiny (and high-cost) pool of experts capable of deploying and managing Hadoop environments.
The key benefit of Aster's SQL-MapReduce framework is that it makes MapReduce accessible to SQL-literate data professionals within Aster's SQL-based database. Thus, the platform supports pattern-detection, graph analysis, and time-series analysis on data such as clickstreams--the sorts of analyses employed to uncover Web purchase patterns or to determine the effectiveness online and email marketing campaigns.
The Aster Data 5.0 upgrades include pre-built MapReduce modules for behavioral clickstream interpretation (why are people following certain navigation paths?), marketing attribution (which email campaigns and banners are driving purchases?), decision-tree analysis (what choices are customers making?), and other analyses. A workload management framework has also been improved to handle memory allocation of SQL and MapReduce processes.
The Aster MapReduce Appliance set for release next year will take advantage of Teradata's hardware expertise and buying power. It will put the Aster database on the hardware used for the Teradata Data Warehouse Appliance, but no details were available on cost or capacities.
Teradata is wisely retaining Aster's current offering of cloud-based or stand-alone database software, giving customers the choice of how they wish to deploy Aster Data. The primary competition to Aster is the combination of an incumbent SQL-based data warehouse and a new Hadoop deployment.
EMC bet on Hadoop last May when it introduced community and commercial Hadoop software distributions. This week the vendor added the EMC Greenplum Modular Data Computing Appliance, which is capable of hosting Greenplum (SQL) database deployments and Hadoop deployments on a single box.
Adding to its column-store announcement, Teradata enumerated other Teradata 14 upgrades aimed making data warehousing "far simpler," with automated capabilities aimed at workload management and partitioning, compression decisions, and temporal (time-based) analyses. The new workload management features are designed to give administrators fine-grained control over service levels down to CPU and data input/output (I/O) usage levels.
Teradata 14 supports virtual partitions that will enable administrators to assign service levels within service levels, giving, say, 60% of capacity to a division in Germany and 40% to a unit in the U.K., and then certain CPU and I/O levels to specific departments, functions, or queries within those offices.
Teradata already had the ability to move "hot" frequently accessed data to cache or fast disks and "cold" infrequently-accessed data to slower disks. The database upgrade adds a Compress on Cold feature that automatically applies appropriate levels of compression based on the same hot/cold analysis. Little-used data will be compressed at up to a 5-to-1 ratio to maximize available storage space.
New temporal capabilities will help global organizations recognize variations in business calendars from country to country. If a work week technically begins on Sunday in one country and Monday in another, this feature lets companies time-specify the data they analyze from those countries when they're trying to analyze weekly payroll or inventory, for example.
With these latest moves, Teradata continues to keep some distance between itself and the rest of the data warehousing pack in terms of advanced features and the size and influence of its customer base. The Aster Data acquisition has brought the company into an emerging market where it had yet to make significant inroads.
EMC and IBM Netezza continue to be Teradata's closest competitors, and EMC in particular seems intent on matching or besting it in advanced areas such as in-database analytics and multi-structured data analysis.
As for Oracle, this week's release of the Oracle Database Appliance underscores that vendor's focus on mainstream database uses. It has yet to show interest in anything other than structured data or to show off anything close to a 100-terabyte league (let alone petabyte-league) customer deployment of Exadata. That leaves lots of room for Teradata and others at the top of the market.
At the 2011 InformationWeek 500 Virtual Conference, C-level executives from leading global companies will gather to discuss how their organizations are turbo-charging business execution and growth. This virtual event happens Oct. 6. Find out more.
About the Author(s)
You May Also Like