10 Hadoop Hardware Leaders - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Hardware/Architectures
News
4/24/2014
09:06 AM
Doug Henschen
Doug Henschen
Slideshows
Connect Directly
LinkedIn
Twitter
RSS
E-Mail
50%
50%

10 Hadoop Hardware Leaders

Hadoop is known for running on "industry standard hardware," but just what does that mean? We break down popular options and a few interesting niche choices.
Previous
1 of 12
Next

Hadoop software is designed to orchestrate massively parallel processing on relatively low-cost servers that pack plenty of storage close to the processing power. All the power, reliability, redundancy, and fault tolerance are built into the software, which distributes the data and processing across tens, hundreds, or even thousands of "nodes" in a clustered server configuration.

Those nodes are "industry standard" x86 servers that cost $2,500 to $15,000 each, depending on CPU, RAM, and disk choices. They're usually middle-of-the-road servers in terms of performance specs. A standard DataNode (a.k.a. Worker node) server, for example, is typically a 2U rack server with a two-socket Intel Sandy Bridge or Ivy Bridge CPU with a total of 12 processors. Each CPU is typically fitted with 64 GB to 128 GB of RAM. DataNodes usually have a dozen 2-TB or 3-TB 3.5-inch hard drives in a JBOD (just a bunch of disks) configuration. [Editor's note: The upward price range quoted above was raised to $15,000 (from $5,000) per server to reflect the inclusion of 12 high-capacity drives in addition to (typically) two standard disks per server.]

Companies seeking a bit more performance, for Spark in-memory analysis or Cloudera Impala, for example, might choose slightly higher clock speeds, 256 GB or more RAM per CPU, while those seeking maximum capacity are choosing 4-TB hard drives.

Management nodes running Hadoop's NameNode (which coordinates data storage) and JobTracker (which coordinates data processing) require less storage but benefit from more reliable power supplies, enterprise-grade disks, RAID redundancy, and a bit more RAM. Connecting the nodes together is a job for redundant 10-Gigabit Ethernet or InfiniBand switches.

It's not uncommon for huge Fortune 100 companies to buy so-called whitebox servers from no-name OEMs for less than $2,500 a crack in high volumes, but it's more typical for the average enterprise to work with Tier 1 vendors such as Cisco, Dell, HP, and IBM. All of these manufacturers now offer servers specifically configured for Hadoop reference architectures for Cloudera, Hortonworks, MapR, and other Hadoop distributions.

Hadoop practitioners often build their own clusters, but appliances have emerged over the last two years offering the convenience of buying everything, including preinstalled software, from a single supplier. EMC's Greenplum division, since spun off as part of Pivotal, was the first to offer a Hadoop appliance, but Oracle, Teradata, IBM, and Microsoft have since followed suit. Appliances may require a minimum half-rack commitment, so they may not be ideal for initial experimentation. Several appliances have fringe benefits, including shared management software and analytic-database or NoSQL-database options.

Not all Hadoop deployments run on middle-of-the-road hardware. Cray and SGI have options to deploy Hadoop on high-performance computing clusters. And with big data being, by definition, a power-intensive pursuit, experiments are underway with low-power servers and next-generation ARM chips that may lure at least some Hadoop users away from the hegemony of x86 servers.

Read on for a look at the server and appliance offerings dominating Hadoop deployments, as well a few of the fringe offerings bringing new twists to the world of Hadoop hardware.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
1 of 12
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
4/24/2014 | 1:03:34 PM
You're probably not looking at hardware if you're thinking cloud
I agree it would be interesting to look at cloud deployment options for Hadoop, but that belongs in a separate collection without the word "hardware" attached. Cloud capacity is undoubtedly the hands-down winner where price-for-performance and Hadoop are concerned. Few if any enterprises can buy, provision, and run at Amazon or even Rackspace or IBM SoftLayer economies of scale.

If you're looking at hardware, I'd submit it's not because you think you can achieve a lower TCO than renting virtual racks from AWS. You're choosing to deploy on premises because that's where your organization wants to keep its data (for security, regulatory or other reasons) and it has the people and data-center capacity to explore the opportunity. Maybe you experimented with Hadoop in the cloud, but now you're ready to build an on-premises cluster. This collection is for you.
D. Henschen
100%
0%
D. Henschen,
User Rank: Author
4/24/2014 | 10:27:15 AM
X86 has a lock... for now
Talking to execs at Clouder and Hortonworks, it's pretty clear that Hadoop is 99.9% deployed on x86 today, with Intel providing the vast majority of the CPUs. Given those reports, it was interesting to read yesterday that IBM is planning to run Hadoop on its next-gen Power8 chips. That will no doubt require a few tweaks to the Hadoop software, which IBM can certianly do with its own BigInsights distribution, but I wonder if anybody will follow suit?

The whole idea with Hadoop is to rely on the software to gain power and redundancy by harnessing many low-cost servers. The entry price for a Power server quoted above is $7,500, whereas low-end X86 rack servers start at $2,500. Having many CPUs, not fewer, more powerful CPUs, is the design point of Hadoop software. I'm sure we'll see a good debate as IBM tries to give Hadoop a blue hue.
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Slideshows
10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
News
Tech Spending Climbs as Digital Business Initiatives Grow
Jessica Davis, Senior Editor, Enterprise Apps,  4/22/2021
Commentary
Optimizing the CIO and CFO Relationship
Mary E. Shacklett, Technology commentator and President of Transworld Data,  4/13/2021
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Slideshows
Flash Poll