Marketers, telcos, and financial services firms are often swamped by machine-generated data. New products from IBM Netezza and InfoBright offer radically different approaches to the challenge.
(click image for larger view)
Slideshow: 8 Big Data Deployments In Detail
If you're trying to analyze Web clickstreams, call data records, financial trading data, log files, or other forms of machine-generated information, chances are you're playing in the "big data" league. But just how big, and how quickly you need answers, will determine your interest in the latest products from IBM Netezza and Infobright.
IBM Netezza on Wednesday announced a High Capacity Appliance aimed at really, really big data. We're talking petabytes, typical for long-term archives maintained for regulatory or compliance reasons. Infobright, meanwhile, has upgraded a column-store database the promises superfast querying of machine-generated data at more routine volumes of less than 40 terabytes. Beyond these specific products, both vendors have answers for the extremes of capacity and speed.
The IBM Netezza High Capacity Appliance is an alternative to the vendor's standard TwinFin product. It boasts four times the data density of the TwinFin thanks to higher-capacity hard drives. It also has about 35% less processing power per rack (to keep costs down and create room for more storage). The appliance stores 500 terabytes per rack, and you can put together as many as 20 racks to handle as much as 10 petabytes of user-addressable data.
Who needs to query that much data? Telcos operating in many countries (India being one example) are required to keep call data records (CDRs) for as long as 10 years so law enforcement agencies can request relevant information. Government intelligence agencies and financial services subject to retention requirements often keep that much data around as well.
Fast querying is generally not important when you're retrieving records to meet regulatory requirements. Thus, the EMC, IBM Netezza, and Teradata high-capacity appliances all favor storage over speed. For example, an identical query will run about 2.5 times faster on the Netezza TwinFin than it will run on that vendor's high-capacity appliance. The TwinFin, however, can't match the low-cost-per terabyte of the IBM Netezza High Capacity Appliance, which works out to less than $2,500 per terabyte, according to Netezza (less than a quarter the cost of the TwinFin).
Plenty of companies need both high capacity and super-fast querying. The likes of EMC, IBM Netezza, and Teradata would likely suggest the combination of their high-capacity appliances and one of their high-performance appliances. Yes, at the opposite end of the speed-versus-scale spectrum, Teradata and EMC both have pure-solid-state appliances (Teradata's being the Extreme Performance Appliance and EMC's being the High Performance EMC Data Computing Appliance). These products have less capacity but about 10 times the speed of each vendor's standard appliance.
IBM Netezza announced Wednesday that it will get in on this act sometime next year with an Ultra Performance appliance employing a combination of flash memory and RAM (a contrast with the solid-state disk drives used by Teradata and EMC). Having a high-performance appliance and a high-capacity appliance gives you the best of both worlds, but it's also no small investment.
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.