Big Data // Hardware/Architectures
News
4/24/2014
09:06 AM
Doug Henschen
Doug Henschen
Slideshows
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

10 Hadoop Hardware Leaders

Hadoop is known for running on "industry standard hardware," but just what does that mean? We break down popular options and a few interesting niche choices.
Previous
1 of 12
Next

Hadoop software is designed to orchestrate massively parallel processing on relatively low-cost servers that pack plenty of storage close to the processing power. All the power, reliability, redundancy, and fault tolerance are built into the software, which distributes the data and processing across tens, hundreds, or even thousands of "nodes" in a clustered server configuration.

Those nodes are "industry standard" x86 servers that cost $2,500 to $15,000 each, depending on CPU, RAM, and disk choices. They're usually middle-of-the-road servers in terms of performance specs. A standard DataNode (a.k.a. Worker node) server, for example, is typically a 2U rack server with a two-socket Intel Sandy Bridge or Ivy Bridge CPU with a total of 12 processors. Each CPU is typically fitted with 64 GB to 128 GB of RAM. DataNodes usually have a dozen 2-TB or 3-TB 3.5-inch hard drives in a JBOD (just a bunch of disks) configuration. [Editor's note: The upward price range quoted above was raised to $15,000 (from $5,000) per server to reflect the inclusion of 12 high-capacity drives in addition to (typically) two standard disks per server.]

Companies seeking a bit more performance, for Spark in-memory analysis or Cloudera Impala, for example, might choose slightly higher clock speeds, 256 GB or more RAM per CPU, while those seeking maximum capacity are choosing 4-TB hard drives.

Management nodes running Hadoop's NameNode (which coordinates data storage) and JobTracker (which coordinates data processing) require less storage but benefit from more reliable power supplies, enterprise-grade disks, RAID redundancy, and a bit more RAM. Connecting the nodes together is a job for redundant 10-Gigabit Ethernet or InfiniBand switches.

It's not uncommon for huge Fortune 100 companies to buy so-called whitebox servers from no-name OEMs for less than $2,500 a crack in high volumes, but it's more typical for the average enterprise to work with Tier 1 vendors such as Cisco, Dell, HP, and IBM. All of these manufacturers now offer servers specifically configured for Hadoop reference architectures for Cloudera, Hortonworks, MapR, and other Hadoop distributions.

Hadoop practitioners often build their own clusters, but appliances have emerged over the last two years offering the convenience of buying everything, including preinstalled software, from a single supplier. EMC's Greenplum division, since spun off as part of Pivotal, was the first to offer a Hadoop appliance, but Oracle, Teradata, IBM, and Microsoft have since followed suit. Appliances may require a minimum half-rack commitment, so they may not be ideal for initial experimentation. Several appliances have fringe benefits, including shared management software and analytic-database or NoSQL-database options.

Not all Hadoop deployments run on middle-of-the-road hardware. Cray and SGI have options to deploy Hadoop on high-performance computing clusters. And with big data being, by definition, a power-intensive pursuit, experiments are underway with low-power servers and next-generation ARM chips that may lure at least some Hadoop users away from the hegemony of x86 servers.

Read on for a look at the server and appliance offerings dominating Hadoop deployments, as well a few of the fringe offerings bringing new twists to the world of Hadoop hardware.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Previous
1 of 12
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
JimBot
50%
50%
JimBot,
User Rank: Apprentice
6/2/2014 | 6:14:55 PM
Re: X86 has a lock... for now
Very interested in IBM's move into this space.  Agree with the premise that Hadoop designed to scale out...be very interesting to see how well that premise holds up as the market matures, scale increases and total costs (not just acquistion costs) become better known. How easily the x86 Hadoop ecosytem moves and/or ports to Power8 is huge (so far) unknown.
PSSCLabs
100%
0%
PSSCLabs,
User Rank: Apprentice
4/26/2014 | 10:47:09 AM
PSSC Labs Should Be Included On This List
PSSC Labs is developing unique Big Data server platforms engineered specifically for Hadoop.  The company offers the world's only Enterprise ready 1U server supporting 48TBs of storage.  This revolutinary product, the CloudOOP 12000, is compatible with leading Hadoop distributions including MapR, Cloudera and Hortonworks.  The CloudOOP 12000 is already deployed in many production environments for all sorts of industry verticals.  In fact, MapR itself is deploying 50 of these units for their own internal development cluster.  

An added benefit of this unique platform is energy efficiency. Most configurations consume less than 250 Watts at load.  That is almost half the power draw of every other company on this list.  

PSSC Labs offers complete, turn-key, ready to run Hadoop Cluster with its' CloudRax platform.   CloudRax supports nearly 2 PBytes in a single 42U rack.   

For more information visit www.pssclabs.com

 

 
bitrefinery
50%
50%
bitrefinery,
User Rank: Apprentice
4/25/2014 | 7:38:26 PM
Re: You're probably not looking at hardware if you're thinking cloud
Actually, we have customers coming to us because we supply the dedicated hardware on private clusters. Most of them don't want to make the investment in the hardware especially for a newer technology like this. Makes sense for them. AWS is great for spinning up nodes once a day and running calculations vs. a 24/7 cluster. Fun stuff.

- Eric, Bit Refinery
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
4/24/2014 | 1:03:34 PM
You're probably not looking at hardware if you're thinking cloud
I agree it would be interesting to look at cloud deployment options for Hadoop, but that belongs in a separate collection without the word "hardware" attached. Cloud capacity is undoubtedly the hands-down winner where price-for-performance and Hadoop are concerned. Few if any enterprises can buy, provision, and run at Amazon or even Rackspace or IBM SoftLayer economies of scale.

If you're looking at hardware, I'd submit it's not because you think you can achieve a lower TCO than renting virtual racks from AWS. You're choosing to deploy on premises because that's where your organization wants to keep its data (for security, regulatory or other reasons) and it has the people and data-center capacity to explore the opportunity. Maybe you experimented with Hadoop in the cloud, but now you're ready to build an on-premises cluster. This collection is for you.
ANON1246461923214
50%
50%
ANON1246461923214,
User Rank: Apprentice
4/24/2014 | 12:12:10 PM
Great article, What about Cloud?
Excellent article, would been interesting to balance these options versus cloud such as AWS's EMR, Azure, etc.
D. Henschen
100%
0%
D. Henschen,
User Rank: Author
4/24/2014 | 10:27:15 AM
X86 has a lock... for now
Talking to execs at Clouder and Hortonworks, it's pretty clear that Hadoop is 99.9% deployed on x86 today, with Intel providing the vast majority of the CPUs. Given those reports, it was interesting to read yesterday that IBM is planning to run Hadoop on its next-gen Power8 chips. That will no doubt require a few tweaks to the Hadoop software, which IBM can certianly do with its own BigInsights distribution, but I wonder if anybody will follow suit?

The whole idea with Hadoop is to rely on the software to gain power and redundancy by harnessing many low-cost servers. The entry price for a Power server quoted above is $7,500, whereas low-end X86 rack servers start at $2,500. Having many CPUs, not fewer, more powerful CPUs, is the design point of Hadoop software. I'm sure we'll see a good debate as IBM tries to give Hadoop a blue hue.
In A Fever For Big Data
In A Fever For Big Data
Healthcare orgs are relentlessly accumulating data, and a growing array of tools are becoming available to manage it.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 18, 2014
Enterprise social network success starts and ends with integration. Here's how to finally make collaboration click.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.