Hadoop systems, including hardware and software, cost about $1,000 a terabyte, or as little as one-twentieth the cost of other data management technologies, says Cloudera exec.
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
Managing prodigious volumes of data is not only challenging from a technological standpoint, it's often expensive as well. Apache Hadoop is a data management system adept at bring data processing and analysis to raw storage. It's a cost-effective alternative to a conventional extract, transform, and load (ETL) process that extracts data from different systems, converts it into a structure suitable for analysis and reporting, and loads it onto a database.
"The opportunity for big data has been stymied by the limitations of today's current data management architecture," said Zedlewski, who called Hadoop "very attractive" for advanced analytics and data processing.
Enterprises that ingest massive amounts of data--50 terabytes per day, for instance--aren't well-served by ETL systems. "It's very common to hear about people who are starting to miss the ETL window," Zedlewski said. "The number of hours it takes to pre-process data before they can make use out of it has grown from four hours to five, six. In many cases, the amount of time is exceeding 24 hours."
In other words, there aren't enough hours in the day to process the volume of data received in a 24-hour period. Hadoop, by comparison, performs advanced data processing and analysis at very high speeds. It's highly scalable and flexible, too.
"Scalability is obviously very essential for big data projects--the whole point is that it's big," Zedlewski said. With Hadoop it's possible to store--and actually ask questions of--100 petabytes of data. "That's something that was never before possible, and is arguably at least 10 times more scalable than the next best alternative," he added.
Apache Hadoop is more than six years old and was developed to help Internet-based companies deal with prodigious volumes of data. A Hadoop system typically integrates with databases or data warehouses. "It's common that Hadoop is used in conjunction with databases. In the Hadoop world, databases don't go away. They just play a different role than Hadoop does," said Zedlewski.
Hadoop's most powerful attribute is its flexibility. "This is probably the single greatest reason why people are attracted to the system," said Zedlewski. Hadoop lets you store and capture all kinds of different data, including documents, images, and video, and make it readily available for processing and analysis.
The cost of a Hadoop data management system, including hardware, software, and other expenses, comes to about $1,000 a terabyte--about one-fifth to one-twentieth the cost of other data management technologies, Zedlewski estimated. Pre-existing data management technologies, by comparison, might make big data projects uneconomical.
"If you look at network storage, it's not unreasonable to think of a number on the order of about $5,000 per terabyte," said Zedlewski. "Sometimes it goes much higher than that. If you look at databases, data marts, data warehouses, and the hardware that supports them, it's not uncommon to talk about numbers more like $10,000 or $15,000 a terabyte."
And because legacy data management technologies often store multiple copies of the same data on different systems, the total cost might be more like $30,000 to $40,000 per terabyte, Zedlewski claims.
Hadoop isn't a cure-all for every use case, but it has proven effective in a variety of industries. In manufacturing, for instance, Hadoop is used to assess product quality. In telecommunications, it's used for content mediation. And it's popular among government agencies for a variety of applications, including security, search, geo-spatial data, and location-based push of data.
Big data places heavy demands on storage infrastructure. In the new, all-digital Big Storage issue of InformationWeek Government, find out how federal agencies must adapt their architectures and policies to optimize it all. Also, we explain why tape storage continues to survive and thrive.
Google in the Enterprise SurveyThere's no doubt Google has made headway into businesses: Just 28 percent discourage or ban use of its productivity products, and 69 percent cite Google Apps' good or excellent mobility. But progress could still stall: 59 percent of nonusers distrust the security of Google's cloud. Its data privacy is an open question, and 37 percent worry about integration.
CIOs Get Smart About BIIT’s tried for years to simplify business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.