The appliance is hitting the market sooner than many people expected it would, and it includes key software from Cloudera, the leading provider of Hadoop system management tools and support services.
When Oracle announced its Big Data Appliance at October's Oracle Open World, the company offered no release dates or details about its planned distribution of open-source Apache Hadoop software. Some took that as a sign that Oracle was stalling. But by releasing the product early in the year in partnership with Cloudera, which has more customers and years in the market than any other Hadoop software and services provider, Oracle has made it clear that it is wasting no time and taking no chances with unproven technology.
[ Want more on how Hadoop is used and why it's gaining interest? Read Hadoop Spurs Big Data Revolution. ]
"Cloudera brings us a couple of very important missing pieces, including its management software and assistance for a deeper second- and third-tier level of support," said George Lumpkin, Oracle's vice president of product management, data warehousing.
Oracle will provide first-line support for the appliance and all software (including the Hadoop distribution and Cloudera Manager) through its case-tracking support infrastructure. But when particularly tough support cases arise, Oracle will tap Cloudera's expertise, Lumpkin said. What's more, Oracle will refer customers to Cloudera for Hadoop training and consulting engagements.
The Oracle Big Data Appliance software bundle will include Cloudera's Distribution of Apache Hadoop and Cloudera Manager, its administration and management console for Hadoop. As announced in October, the appliance will also include an open-source distribution of R software and the Oracle NoSQL database. R is used for predictive analytics and statistical modeling while the NoSQL product is a transactional, key-value store database capable of interpreting new data on the fly without a predefined relational schema.
Customers will be able to configure and use the software as they see fit. Lumpkin said Oracle expects many customers will run Hadoop exclusively, while others will run the NoSQL database, or use both products simultaneously. The main attraction, however, is Hadoop, a data processing platform being embraced for its combination of scalability, flexibility, and low cost. Hadoop has become the default choice for Internet giants dealing with high-scale clickstream data, but it's now headed for wider use among many of Oracle's longstanding database customers.
From a hardware perspective, the Big Data Appliance will be offered exclusively in full-rack configurations, with each rack offering 864 gigabytes of main memory, 216 CPU cores, 648 terabytes of raw disk storage, and 40 gigabit-per-second InifiniBand internal connectivity between nodes. The hardware and software combined will sell for $450,000, with an annual support fee for both hardware and software of 12%. That's highly competitive, working out to less than $700 per terabyte and being in line with the low costs big data practitioners expect from deployments built on commodity hardware.
"Oracle has put together a very comprehensive product that is priced very well," Kurt Dunn, Cloudera's chief operating officer told InformationWeek. Where commodity deployments tend to be single-purpose Hadoop platforms, Dunn noted the Big Data Appliance will combine Cloudera's Hadoop distribution and management software with a NoSQL database and R analytics software.
The partnership with Oracle is a big win for Cloudera, which will see its software promoted and distributed by the dominant database vendor. That said, Cloudera will continue to offer its software for deployment on third-party hardware, Dunn said, and the partnership with Oracle is nonexclusive.
The Oracle Big Data Appliance will immediately go toe to toe with the EMC Data Computing Appliance, which includes and EMC Greenplum Apache Hadoop distribution complemented by Hadoop management software from MapR. The appliance also supports modular deployment of the SQL-based Greenplum relational database on the same box.
IBM introduced Hadoop-based InfoSphere BigInsights software in May, but it does not offer a related appliance. Microsoft has announced plans for a 2012 release of a Hadoop distribution tied to SQL Server 2012, with software developed by Hortonworks. Microsoft's release dates have yet to be set and there are no announced plans, as yet, for a related appliance.
Oracle also announced on Tuesday the release of Oracle Big Data Connectors software that will support the Big Data Appliance, as well as other Apache Hadoop-based systems. The software includes four key products. Oracle Loader for Hadoop uses MapReduce (the primary data-processing approach used in Hadoop) to load data into Oracle Database 11g. Oracle Data Integrator Application Adopter for Hadoop generates Hadoop MapReduce processes through what Oracle describes as an easy-to-use graphical interface. Oracle Connector R lets analysts using that analytics software mine data from the Hadoop Distributed File System (HDFS). Oracle Connector for HDFS supports SQL querying of Hadoop's file store using the Oracle Database SQL engine. Big Data Connectors will cost $2,000 per processor license.
Oracle highlighted the Big Data Appliance as a complement to a growing family of "engineered systems" that now includes Exadata, Exalogic, and the Exalytics In-Memory Machine. But what's more remarkable, comments Gartner analyst Merv Adrian, is the fact that Oracle is finally looking beyond its core database. Oracle's TimesTen and Essbase databases, which were recently upgraded for use in the Exalytics appliance, and BerkeleyDB, which was Oracle's development starting point for the new NoSQL database, are examples of that shift.
"Oracle is suddenly beginning to act as a data-management portfolio company, not just a company with a big brother and a bunch of starving siblings," Adrian said.