Software // Information Management
News
6/23/2011
05:19 PM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

2 Ways To Tackle Really Big Data

Marketers, telcos, and financial services firms are often swamped by machine-generated data. New products from IBM Netezza and InfoBright offer radically different approaches to the challenge.

8 Big Data Deployments In Detail
(click image for larger view)
Slideshow: 8 Big Data Deployments In Detail
If you're trying to analyze Web clickstreams, call data records, financial trading data, log files, or other forms of machine-generated information, chances are you're playing in the "big data" league. But just how big, and how quickly you need answers, will determine your interest in the latest products from IBM Netezza and Infobright.

IBM Netezza on Wednesday announced a High Capacity Appliance aimed at really, really big data. We're talking petabytes, typical for long-term archives maintained for regulatory or compliance reasons. Infobright, meanwhile, has upgraded a column-store database the promises superfast querying of machine-generated data at more routine volumes of less than 40 terabytes. Beyond these specific products, both vendors have answers for the extremes of capacity and speed.

The IBM Netezza High Capacity Appliance is an alternative to the vendor's standard TwinFin product. It boasts four times the data density of the TwinFin thanks to higher-capacity hard drives. It also has about 35% less processing power per rack (to keep costs down and create room for more storage). The appliance stores 500 terabytes per rack, and you can put together as many as 20 racks to handle as much as 10 petabytes of user-addressable data.

Who needs to query that much data? Telcos operating in many countries (India being one example) are required to keep call data records (CDRs) for as long as 10 years so law enforcement agencies can request relevant information. Government intelligence agencies and financial services subject to retention requirements often keep that much data around as well.

This niche was previously addressed by both Teradata, which introduced its Extreme Data Appliance in 2009, and by EMC, which introduced its High Capacity ECM Data Computing Appliance in April.

Fast querying is generally not important when you're retrieving records to meet regulatory requirements. Thus, the EMC, IBM Netezza, and Teradata high-capacity appliances all favor storage over speed. For example, an identical query will run about 2.5 times faster on the Netezza TwinFin than it will run on that vendor's high-capacity appliance. The TwinFin, however, can't match the low-cost-per terabyte of the IBM Netezza High Capacity Appliance, which works out to less than $2,500 per terabyte, according to Netezza (less than a quarter the cost of the TwinFin).

Plenty of companies need both high capacity and super-fast querying. The likes of EMC, IBM Netezza, and Teradata would likely suggest the combination of their high-capacity appliances and one of their high-performance appliances. Yes, at the opposite end of the speed-versus-scale spectrum, Teradata and EMC both have pure-solid-state appliances (Teradata's being the Extreme Performance Appliance and EMC's being the High Performance EMC Data Computing Appliance). These products have less capacity but about 10 times the speed of each vendor's standard appliance.

IBM Netezza announced Wednesday that it will get in on this act sometime next year with an Ultra Performance appliance employing a combination of flash memory and RAM (a contrast with the solid-state disk drives used by Teradata and EMC). Having a high-performance appliance and a high-capacity appliance gives you the best of both worlds, but it's also no small investment.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 24, 2014
Start improving branch office support by tapping public and private cloud resources to boost performance, increase worker productivity, and cut costs.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.