This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
Hadoop systems, including hardware and software, cost about $1,000 a terabyte, or as little as one-twentieth the cost of other data management technologies, says Cloudera exec.
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
Managing prodigious volumes of data is not only challenging from a technological standpoint, it's often expensive as well. Apache Hadoop is a data management system adept at bring data processing and analysis to raw storage. It's a cost-effective alternative to a conventional extract, transform, and load (ETL) process that extracts data from different systems, converts it into a structure suitable for analysis and reporting, and loads it onto a database.
"The opportunity for big data has been stymied by the limitations of today's current data management architecture," said Zedlewski, who called Hadoop "very attractive" for advanced analytics and data processing.
Enterprises that ingest massive amounts of data--50 terabytes per day, for instance--aren't well-served by ETL systems. "It's very common to hear about people who are starting to miss the ETL window," Zedlewski said. "The number of hours it takes to pre-process data before they can make use out of it has grown from four hours to five, six. In many cases, the amount of time is exceeding 24 hours."
In other words, there aren't enough hours in the day to process the volume of data received in a 24-hour period. Hadoop, by comparison, performs advanced data processing and analysis at very high speeds. It's highly scalable and flexible, too.
"Scalability is obviously very essential for big data projects--the whole point is that it's big," Zedlewski said. With Hadoop it's possible to store--and actually ask questions of--100 petabytes of data. "That's something that was never before possible, and is arguably at least 10 times more scalable than the next best alternative," he added.
Apache Hadoop is more than six years old and was developed to help Internet-based companies deal with prodigious volumes of data. A Hadoop system typically integrates with databases or data warehouses. "It's common that Hadoop is used in conjunction with databases. In the Hadoop world, databases don't go away. They just play a different role than Hadoop does," said Zedlewski.
Hadoop's most powerful attribute is its flexibility. "This is probably the single greatest reason why people are attracted to the system," said Zedlewski. Hadoop lets you store and capture all kinds of different data, including documents, images, and video, and make it readily available for processing and analysis.
The cost of a Hadoop data management system, including hardware, software, and other expenses, comes to about $1,000 a terabyte--about one-fifth to one-twentieth the cost of other data management technologies, Zedlewski estimated. Pre-existing data management technologies, by comparison, might make big data projects uneconomical.
"If you look at network storage, it's not unreasonable to think of a number on the order of about $5,000 per terabyte," said Zedlewski. "Sometimes it goes much higher than that. If you look at databases, data marts, data warehouses, and the hardware that supports them, it's not uncommon to talk about numbers more like $10,000 or $15,000 a terabyte."
And because legacy data management technologies often store multiple copies of the same data on different systems, the total cost might be more like $30,000 to $40,000 per terabyte, Zedlewski claims.
Hadoop isn't a cure-all for every use case, but it has proven effective in a variety of industries. In manufacturing, for instance, Hadoop is used to assess product quality. In telecommunications, it's used for content mediation. And it's popular among government agencies for a variety of applications, including security, search, geo-spatial data, and location-based push of data.
Big data places heavy demands on storage infrastructure. In the new, all-digital Big Storage issue of InformationWeek Government, find out how federal agencies must adapt their architectures and policies to optimize it all. Also, we explain why tape storage continues to survive and thrive.
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
State of the CloudCloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Data Science and AI in the Fast LaneThis IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!