Raymie Stata, CEO of Altiscale, was CTO at Yahoo when Doug Cutting and other Yahoo developers put Hadoop to work indexing the Web. But there's already so much former Yahoo expertise in the Hadoop marketplace that Stata felt compelled to ask in an Altiscale blog post on June 12: "Does the world need yet another Hadoop startup from a bunch of former Yahoos?"
The answer, obviously, was yes. Altiscale's plan is to offer large-scale Hadoop as a service, and that approach became public June 19 at the GigaOm Structure show in San Francisco. Altiscale's product is still in private beta, but Stata said in an interview that it will be ready when the production version of Hadoop 2.0 comes out of the Apache Software Foundation. That's expected to happen in August.
Many companies have found Hadoop highly useful and are doing important work on 10- or 20-node clusters. Hadoop is natural cluster software, spreading unstructured data out across a cluster, and then assigning data to a CPU close to it when it comes time to sort and process it. "We want to target the folks who are using Hadoop on 10-20 node clusters," Stata explained. "They're using them successfully, and their usage is growing."
[ Read about Hadoop's evolution from backroom science project to industry-leading big data manager. See Hadoop: From Experiment To Leading Big Data Platform. ]
But clusters are unlikely to grow at the same pace as users' appetite for more Hadoop analytics. As Hadoop clusters get bigger, managing them becomes more complicated and time-consuming until Hadoop users face diminishing returns on their effort, Stata said.
His firm wants to use large-scale Hadoop expertise to turn those 10- and 20-node users into 100- and 500-node users. "We want to build big Hadoop clusters [and make them available as an online service]," Stata said.
The future Altiscale service will make use of the Yarn resource manager, which is expected to be part of Hadoop 2.0. Yarn moves Hadoop beyond its one-job-at-a-time batch processing style of operation and allows it to run multiple applications simultaneously. With Yarn, Hadoop becomes a message passing system rather than a MapReduce system, with the messages able to more dynamically change the data available to the cluster's CPU power.
By catching the release of Hadoop 2.0, Altiscale hopes to bring enough new features to market to capture business from the many Hadoop companies already firmly established in the marketplace, such as Hortonworks and Cloudera.
Previously known as Verticloud, Altiscale changed its name to avoid disputes over trademark and copyright.