Software // Information Management
Commentary
3/1/2012
09:11 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Why Hadoop Crowd Is Hearing Much About Hortonworks

Hadoop super-influencer Hortonworks stacks up big data deals with Microsoft, Teradata, and Talend.

12 Hadoop Vendors To Watch In 2012
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
For a company that has yet to deliver its first product, it's remarkable just how prominent Hortonworks has managed to become in the emerging world of Hadoop. Last week it inked a deal with Teradata, on Tuesday it extended ties with Microsoft, and on Wednesday it announced it will add Talend's open source data-integration studio to its pending Apache Hadoop distribution.

Why so much attention to a company with a 1.0 product still in technology preview mode? First and foremost, Hadoop itself is a white-hot topic, drawing customers seeking a low-cost option for managing ever-growing and increasingly varied big-data stockpiles. Hadoop's collection of tools promises scalability, flexibility and low cost.

The scalability is tied to Hadoop's core distributed MapReduce processing approach. Flexibility stems from its schema-free design, which accepts data in any format. And with open-source software and compute capacity built on grids of commodity hardware, Hadoop clusters can be built on shoestring budgets.

[ Want more on Hadoop? Read Hadoop Spurs Big Data Revolution. ]

Hadoop is already a default choice for Internet giants dealing with large-scale clickstream analysis. But the platform is now headed for wider use in all types of scenarios where data is encountered at scale and in great variety. What's sorely needed is lots of enterprise-friendly monitoring and management tools as well as performance and high-availability features, elements Hortonworks hopes to help introduce purely within open-source software.

The attention to Hortonworks in particular has to do with pedigree, positioning, and, more recently, Cloudera's big partnership with Oracle. The pedigree part is about Hortonworks' status as a spinoff of Yahoo, which is the world's single largest user of Hadoop. In fact, Hadoop practically got its start at Yahoo as Doug Cutting, the original creator of the open source software, is a former Yahoo employee who credits the company with helping to develop and promote Hadoop in its earliest days back in 2006.

Spun out as a joint venture with Benchmark Capital in 2011, Hortonworks inherited a team of nearly 50 of Yahoo's earliest and most prolific contributors to the Apache Hadoop community. It's a powerful group of Hadoop influencers now employed by Hortonworks, which is undoubtedly why Microsoft called on the company's services last October to develop open-source software that could run Hadoop on Windows Server.

The software needed to run Hadoop on Windows has been completed and is now in Apache's community review process. Hortonworks and Microsoft announced on Tuesday that the next step will be to add a Microsoft-friendly JavaScript development framework and an ODBC-standard connector for Hive, the data warehousing component of Hadoop. That connector will let people use Microsoft Excel to access Hadoop data, opening it up to millions of Excel users as well as to advanced users of the in-memory PowerPivot plug-in for Excel.

In terms of positioning, Hortonworks has committed to developing software entirely under Apache's open source license. "We think it's important not to have derivative works or off-ramps around Hadoop because that will slow development and fracture the market for Hadoop," Hortonworks executive Rob Bearden recently told me. Bearden was COO at the time of the interview, but he has since succeeded Eric Baldeschwieler, co-founder and now CTO, as Hortonworks' CEO.

By wearing a white hat and championing what's best for the entire Hadoop community, Hortonworks has won friends and, to a degree (outside of a bit of blog-based and social-network bravado), stifled sniping by would-be competitors such as Cloudera, EMC, and MapR. Like those three, Hortonworks plans to make money by backing up Hadoop with consulting, training and software support .

There are other revenue sources for these vendors. Hortonworks makes money developing Hadoop-compatible software for the likes of Microsoft. Cloudera and Map R charge for proprietary software that compliments Hadoop. And EMC makes money on storage and appliance hardware as well as its Greenplum relational database.

Hortonworks isn't really battling tooth-and-nail for end-user customers as yet, but that will change once the company releases its Hortonworks Data Platform (HDP), the company's planned distribution of Apache Hadoop software. HDP 1.0 was released as a technology preview (beta) release back in November, and it's expected to become generally available "within a matter of weeks," according to Shaun Connolly, Hortonworks' VP of corporate strategy. Hortonworks is also ramping up professional services, training, and consulting services that will butt heads with Cloudera, EMC, and MapR.

Hortonworks will have to step on a few toes sooner or later. Open-source altruism notwithstanding, Bearden's charge is to create a profit-making company, something he did as COO at both SpringSource and JBoss before those open-source-focused firms were bought by VMware and Red Hat, respectively.

This brings us to the third reason Hortonworks is getting so much attention: Cloudera's deal to provide the Hadoop software behind Oracle's Big Data Appliance. That move created a long list of Oracle competitors who suddenly had good reason to partner or deepen existing partnerships with an alternative to Cloudera like Hortonworks.

Teradata is an Oracle rival, so it came as no surprise when it announced last week that it will rely on Hortonworks to develop integrations and tools to exchange data between Hadoop and the Teradata and (Teradata owned) AsterData relational databases.

In the alliance with Talend announced on Wednesday, Hortonworks will bundle the data-integration vendor's open-source Talend Open Studio For Big Data with HDP.

Hortonworks has partnerships with plenty of other data-integration vendors, including Informatica, Pentaho, and SnapLogic. But Talend's Open Studio will be distributable under the Apache open source license (like Hadoop itself) and includes connectors for five major Hadoop components: the Hadoop Distributed File System (HDFS), HBase, Pig, Sqoop and Hive. In short, the product will give users a free option for getting data into and out of Hadoop.

"The partnerships with Teradata, Microsoft and Talend are all leading indicators of where we're going," Hortonworks' Connolly told me yesterday. "It's about getting data into and out of Hadoop and making the platform as broadly useful as possible."

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, don’t look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek - July 21, 2014
Our new survey shows fed agencies focusing more on security, as they should, but they're still behind the times with cloud and overall innovation.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
In this special, sponsored radio episode we’ll look at some terms around converged infrastructures and talk about how they’ve been applied in the past. Then we’ll turn to the present to see what’s changing.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.