Hadoop Spurs Big Data Revolution - InformationWeek
Software // Information Management
11:20 AM
Connect Directly

Hadoop Spurs Big Data Revolution

Open source data processing platform has won over Web giants for its low cost, scalability, and flexibility. Now Hadoop will make its way into more enterprises.

Hadoop Basics

Inspired in large part by a 2004 white paper in which Google described its use of MapReduce techniques, Hadoop is a Java-based software framework for distributed processing of data-intensive transformations and analyses. MapReduce breaks a big data problem into subproblems; distributes them onto tens, hundreds, and even thousands of processing nodes; and then combines the results into a smaller, easy-to-analyze data set.

Hadoop includes several important subprojects and related Apache projects. The Hadoop Distributed File System (HDFS) gives the platform massive yet low-cost storage capacity. The Pig data-flow language is used to write parallel processing jobs. The HBase distributed, column-oriented database gives Hadoop a structured-data storage option for large tables. And the Hive distributed data warehouse supports data summarization and ad hoc querying.

Hadoop gets its well-known scalability from its ability to distribute large-scale data processing jobs across thousands of compute nodes built on low-cost x86 servers. Its capacity is constantly increasing, thanks to Moore's Law and ever-rising memory and disk drive capacity. The latest supporting hardware deployments combine 16 compute cores, 128 MB of RAM, and as much as 12 TB or even 24 TB of hard disk capacity per node. The cost of each node is about $4,000, according to Cloudera, the leading provider of commercial support and enterprise management software for Hadoop deployments. That cost is a fraction of the $10,000 to $12,000 per terabyte for the most competitively priced relational database deployments.

This high-capacity and low-cost combination is compelling enough, but Hadoop's other appeal is its ability to handle mixed data types. It can manage structured data as well as highly variable data sources, such as sensor and server log files and Web clickstreams. It can also manage unstructured, text-centric data sources, such as feeds from Facebook and Twitter. ("Loosely structured" or "free form" are actually more accurate descriptions of this type of data, but "unstructured" is the description that has stuck.)

This ability to handle various types of data is so important it has spawned the broader NoSQL (not only SQL) movement. Platforms and products, such as Cassandra, CouchDB, MongoDB, and Oracle's new NoSQL database, address the need for data flexibility in transactional processing. Hadoop has garnered most of the attention for supporting data analysis.

Relational databases, such as IBM DB2, Oracle, Microsoft SQL Server, and MySQL, can't handle mixed data types and unstructured data, because they don't fit into the columns and rows of a predefined data model (see "Hadoop's Flexibility Wins Over Online Data Provider").

2 of 5
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
6/15/2012 | 10:10:09 PM
re: Hadoop Spurs Big Data Revolution
That's per core, but these stats have all been surpassed with the latest hardware.
User Rank: Apprentice
12/6/2011 | 4:08:09 AM
re: Hadoop Spurs Big Data Revolution
Reading through the whole document I see only one mention of Yahoo, and no mention of Yahoo as the originator of Hadoop. It sometimes appears that the Press is intent on highlighting all of Yahoo's weaknesses, and none of it's strengths. Perhaps you think this information is already well-known, but the pie-chart showing that 74% have "no current or planned use" would suggest otherwise. For those who wish to read more meaty detail, see http://developer.yahoo.com/had....
User Rank: Apprentice
12/4/2011 | 8:59:29 PM
re: Hadoop Spurs Big Data Revolution
Matspca - we're working on establishing a benchmark for Hadoop. If you'd like to participate, please let me know at indu.kodukula@sungard.com
User Rank: Apprentice
11/30/2011 | 11:01:45 PM
re: Hadoop Spurs Big Data Revolution
Not everyone believes in the Hype of Hadoop. See http://www.vertica.com/2011/09... The big organizations mentioned here can afford to use non optimal solutions. I have seen no benchmark showing Hadoop beating say Oracle. My own noSQL database beats Hadoop by a large margin using $330 PC verses $1 million (or so) used by Hadoop for the same benchmark. See http://www.velocitydb.com/Comp...

I will continue following the Hype of Hadoop and if there really is some substance behind it then I look forward to a .NET version of the distribution mechanism.
User Rank: Apprentice
11/10/2011 | 8:48:32 PM
re: Hadoop Spurs Big Data Revolution
128 MB of RAM for 16 cores? That has to be typo.
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 6, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll