How Hadoop Cuts Big Data Costs - InformationWeek
IoT
IoT
Software
News
7/27/2012
03:55 PM
Connect Directly
Google+
RSS
E-Mail
50%
50%

How Hadoop Cuts Big Data Costs

Hadoop systems, including hardware and software, cost about $1,000 a terabyte, or as little as one-twentieth the cost of other data management technologies, says Cloudera exec.

12 Hadoop Vendors To Watch In 2012
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
Managing prodigious volumes of data is not only challenging from a technological standpoint, it's often expensive as well. Apache Hadoop is a data management system adept at bring data processing and analysis to raw storage. It's a cost-effective alternative to a conventional extract, transform, and load (ETL) process that extracts data from different systems, converts it into a structure suitable for analysis and reporting, and loads it onto a database.

"Big data tends to overwhelm ETL as a process," said Charles Zedlewski, VP of product at Cloudera, during a Carahsoft-hosted webinar this week. Cloudera sells a data-management platform built on Hadoop open-source software.

"The opportunity for big data has been stymied by the limitations of today's current data management architecture," said Zedlewski, who called Hadoop "very attractive" for advanced analytics and data processing.

Enterprises that ingest massive amounts of data--50 terabytes per day, for instance--aren't well-served by ETL systems. "It's very common to hear about people who are starting to miss the ETL window," Zedlewski said. "The number of hours it takes to pre-process data before they can make use out of it has grown from four hours to five, six. In many cases, the amount of time is exceeding 24 hours."

In other words, there aren't enough hours in the day to process the volume of data received in a 24-hour period. Hadoop, by comparison, performs advanced data processing and analysis at very high speeds. It's highly scalable and flexible, too.

[ Read IT's Next Hot Job: Hadoop Guru. ]

"Scalability is obviously very essential for big data projects--the whole point is that it's big," Zedlewski said. With Hadoop it's possible to store--and actually ask questions of--100 petabytes of data. "That's something that was never before possible, and is arguably at least 10 times more scalable than the next best alternative," he added.

Apache Hadoop is more than six years old and was developed to help Internet-based companies deal with prodigious volumes of data. A Hadoop system typically integrates with databases or data warehouses. "It's common that Hadoop is used in conjunction with databases. In the Hadoop world, databases don't go away. They just play a different role than Hadoop does," said Zedlewski.

Hadoop's most powerful attribute is its flexibility. "This is probably the single greatest reason why people are attracted to the system," said Zedlewski. Hadoop lets you store and capture all kinds of different data, including documents, images, and video, and make it readily available for processing and analysis.

The cost of a Hadoop data management system, including hardware, software, and other expenses, comes to about $1,000 a terabyte--about one-fifth to one-twentieth the cost of other data management technologies, Zedlewski estimated. Pre-existing data management technologies, by comparison, might make big data projects uneconomical.

"If you look at network storage, it's not unreasonable to think of a number on the order of about $5,000 per terabyte," said Zedlewski. "Sometimes it goes much higher than that. If you look at databases, data marts, data warehouses, and the hardware that supports them, it's not uncommon to talk about numbers more like $10,000 or $15,000 a terabyte."

And because legacy data management technologies often store multiple copies of the same data on different systems, the total cost might be more like $30,000 to $40,000 per terabyte, Zedlewski claims.

Hadoop isn't a cure-all for every use case, but it has proven effective in a variety of industries. In manufacturing, for instance, Hadoop is used to assess product quality. In telecommunications, it's used for content mediation. And it's popular among government agencies for a variety of applications, including security, search, geo-spatial data, and location-based push of data.

Big data places heavy demands on storage infrastructure. In the new, all-digital Big Storage issue of InformationWeek Government, find out how federal agencies must adapt their architectures and policies to optimize it all. Also, we explain why tape storage continues to survive and thrive.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
5 Data and AI Trends for 2019
Jessica Davis, Senior Editor, Enterprise Apps,  1/7/2019
Commentary
Act Now to Reap Automation Benefits Later
Guest Commentary, Guest Commentary,  1/3/2019
Commentary
Cloud Trends: Look Behind the Numbers
James M. Connolly, Executive Managing Editor, InformationWeekEditor in Chief,  12/31/2018
White Papers
Register for InformationWeek Newsletters
2018 State of the Cloud
2018 State of the Cloud
Cloud adoption is growing, but how are organizations taking advantage of it? Interop ITX and InformationWeek surveyed technology decision-makers to find out, read this report to discover what they had to say!
Video
Current Issue
Enterprise Software Options: Legacy vs. Cloud
InformationWeek's December Trend Report helps IT leaders rethink their enterprise software systems and consider whether cloud-based options like SaaS may better serve their needs.
Slideshows
Flash Poll