Why so much attention to a company with a 1.0 product still in technology preview mode? First and foremost, Hadoop itself is a white-hot topic, drawing customers seeking a low-cost option for managing ever-growing and increasingly varied big-data stockpiles. Hadoop's collection of tools promises scalability, flexibility and low cost.
The scalability is tied to Hadoop's core distributed MapReduce processing approach. Flexibility stems from its schema-free design, which accepts data in any format. And with open-source software and compute capacity built on grids of commodity hardware, Hadoop clusters can be built on shoestring budgets.
[ Want more on Hadoop? Read Hadoop Spurs Big Data Revolution. ]
Hadoop is already a default choice for Internet giants dealing with large-scale clickstream analysis. But the platform is now headed for wider use in all types of scenarios where data is encountered at scale and in great variety. What's sorely needed is lots of enterprise-friendly monitoring and management tools as well as performance and high-availability features, elements Hortonworks hopes to help introduce purely within open-source software.
The attention to Hortonworks in particular has to do with pedigree, positioning, and, more recently, Cloudera's big partnership with Oracle. The pedigree part is about Hortonworks' status as a spinoff of Yahoo, which is the world's single largest user of Hadoop. In fact, Hadoop practically got its start at Yahoo as Doug Cutting, the original creator of the open source software, is a former Yahoo employee who credits the company with helping to develop and promote Hadoop in its earliest days back in 2006.
Spun out as a joint venture with Benchmark Capital in 2011, Hortonworks inherited a team of nearly 50 of Yahoo's earliest and most prolific contributors to the Apache Hadoop community. It's a powerful group of Hadoop influencers now employed by Hortonworks, which is undoubtedly why Microsoft called on the company's services last October to develop open-source software that could run Hadoop on Windows Server.
In terms of positioning, Hortonworks has committed to developing software entirely under Apache's open source license. "We think it's important not to have derivative works or off-ramps around Hadoop because that will slow development and fracture the market for Hadoop," Hortonworks executive Rob Bearden recently told me. Bearden was COO at the time of the interview, but he has since succeeded Eric Baldeschwieler, co-founder and now CTO, as Hortonworks' CEO.
By wearing a white hat and championing what's best for the entire Hadoop community, Hortonworks has won friends and, to a degree (outside of a bit of blog-based and social-network bravado), stifled sniping by would-be competitors such as Cloudera, EMC, and MapR. Like those three, Hortonworks plans to make money by backing up Hadoop with consulting, training and software support .
There are other revenue sources for these vendors. Hortonworks makes money developing Hadoop-compatible software for the likes of Microsoft. Cloudera and Map R charge for proprietary software that compliments Hadoop. And EMC makes money on storage and appliance hardware as well as its Greenplum relational database.
Hortonworks isn't really battling tooth-and-nail for end-user customers as yet, but that will change once the company releases its Hortonworks Data Platform (HDP), the company's planned distribution of Apache Hadoop software. HDP 1.0 was released as a technology preview (beta) release back in November, and it's expected to become generally available "within a matter of weeks," according to Shaun Connolly, Hortonworks' VP of corporate strategy. Hortonworks is also ramping up professional services, training, and consulting services that will butt heads with Cloudera, EMC, and MapR.
Hortonworks will have to step on a few toes sooner or later. Open-source altruism notwithstanding, Bearden's charge is to create a profit-making company, something he did as COO at both SpringSource and JBoss before those open-source-focused firms were bought by VMware and Red Hat, respectively.
This brings us to the third reason Hortonworks is getting so much attention: Cloudera's deal to provide the Hadoop software behind Oracle's Big Data Appliance. That move created a long list of Oracle competitors who suddenly had good reason to partner or deepen existing partnerships with an alternative to Cloudera like Hortonworks.
Teradata is an Oracle rival, so it came as no surprise when it announced last week that it will rely on Hortonworks to develop integrations and tools to exchange data between Hadoop and the Teradata and (Teradata owned) AsterData relational databases.
In the alliance with Talend announced on Wednesday, Hortonworks will bundle the data-integration vendor's open-source Talend Open Studio For Big Data with HDP.
Hortonworks has partnerships with plenty of other data-integration vendors, including Informatica, Pentaho, and SnapLogic. But Talend's Open Studio will be distributable under the Apache open source license (like Hadoop itself) and includes connectors for five major Hadoop components: the Hadoop Distributed File System (HDFS), HBase, Pig, Sqoop and Hive. In short, the product will give users a free option for getting data into and out of Hadoop.
"The partnerships with Teradata, Microsoft and Talend are all leading indicators of where we're going," Hortonworks' Connolly told me yesterday. "It's about getting data into and out of Hadoop and making the platform as broadly useful as possible."