Big Data // Big Data Analytics
News
12/19/2013
09:10 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%
Repost This

Cloudera Goes Cloud With Amazon

Cloudera says Amazon Web Services partnership will help customers move beyond Hadoop sandboxes to bring high-scale production platforms into the cloud.

Top 10 Cloud Fiascos
Top 10 Cloud Fiascos
(click image for larger view)

Cloudera and Amazon Web Services (AWS) announced on Thursday a partnership that promises simple and efficient ways to quickly bring production-scale, Hadoop-based Cloudera deployments into the cloud.

Many customers already run Cloudera's software on AWS, but they do so in a two-step process in which they license and download the software from Cloudera and then deploy it on AWS cloud infrastructure. That two-step approach is still in place as of today, but the partnership calls for that process to be collapsed into a single step. The new approach will not only speed and ease deployment, it will bring Cloudera's entire stack to the cloud for production use, according to Tim Stevens, Cloudera's VP of business and corporate development.

"The cloud use cases to date have typically been for research and development, testing, and proof-of-concept experimentation," said Stevens in a phone interview with InformationWeek. "Now we're bringing a certified and supported, enterprise-data-hub experience to the Amazon cloud."

"Enterprise data hub" is Cloudera's vision for the broad use of its platform as the first destination and center of data-management activity within enterprises. Where test-and-dev deployments on AWS have been limited to Cloudera's core open-source Hadoop distribution, Stevens said the partnership will bring more extensive deployments including Cloudera Manager, Cloudera Impala, and Cloudera Search.

[Want more on the enterprise data hub vision? Read "Cloudera Plans Data Hub Role For Hadoop." ]

What kinds of companies are interested in running high-scale, production Hadoop clusters in Amazon's cloud? The most obvious candidates are Internet firms and media companies running on AWS infrastructure and storing vast quantities of data in its cloud. But large telecommunications companies, financial services, and retailers are also looking to bring at least some of their data into the cloud, says Stevens.

"Retailers, for example, have systems to capture and analyze purchases taking place in stores, and they have separate systems to capture and analyze purchases taking place online," he explained. "Some retailers want to marry those two datasets together in the cloud while others want to do it on-premises." Those leaning toward the cloud seek agility and want to stay out of the business of running Hadoop clusters.

Cloudera isn't the first Hadoop distributor to forge cloud partnerships. MapR struck up relationships with both Amazon and Google in 2012, and MapR instances have been available directly on AWS for more than a year. Hortonworks is partnered with Microsoft, and the Windows-based version of its software has been available as a service on Azure since early this year. IBM, Pivotal, and Teradata have also introduced cloud deployment options for their Hadoop offerings.

Cloudera ramped up its cloud activity in October with the introduction of Cloudera Connect: Cloud. That program counts Savvis (a CenturyLink company), SoftLayer (an IBM company), T-Systems, and Verizon Cloud as service partners, and Amazon is now joining the club.

Cloudera will provide the first line of support for AWS deployments, and it will use its Cloudera Manager systems-management software to determine whether any problems experienced are related to software or hardware.

"If it's a hardware or infrastructure problem, we have a hotline into the Amazon support structure to be able to address that right away," says Stevens.

Will most enterprise cloud deployments of Cloudera's software continue in the proof-of-concept mold? Only time will tell as companies weigh agility and low-touch infrastructure management against the cost, control, and perceived security of storing high-scale data in the cloud.

Doug Henschen is executive editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data, and analytics. He previously served as editor-in-chief of Intelligent Enterprise, editor-in-chief of Transform Magazine, and executive editor at DM News.

IT groups need data analytics software that's visual and accessible. Vendors are getting the message. Also in the State Of Analytics issue of InformationWeek: SAP CEO envisions a younger, greener, cloudier company. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
12/19/2013 | 1:03:50 PM
Why didn't this happen sooner?
I recall attending Hadoop World in 2009 (500 ish people at the Roosevelt Hotel) and Amazon had just released Elastic Map Reduce. Why it took this long to see a Cloudera-AWS combo is beyond me, but it's a good thing whatever egos got in the way are now in check.
cbabcock
50%
50%
cbabcock,
User Rank: Author
12/19/2013 | 1:00:10 PM
Cloud economics spur Cloudera, AWS joint decision
A Hadoop service on Amazon, backed by the tools and expertise of Cloudera? I can see why Amazon is willing to partner and make that a one-step process. Think more Hadoop clusters running on EC2. And Hadoop users are prime beneficiaries of the cloud's elasticity. Spin it up when you need it; shut it down when you don't People who believe there's no economic benefit to cloud computing should study how companies use Hadoop.
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.