Oracle on Tuesday introduced an X3-2 Starter Rack version of its Big Data Appliance that will enable customers to scale out in six-node, 216-terabyte increments. It's an acknowledgement that Oracle's original, full-rack-only offering was out of sync with the data-scale realities of most organizations.
Introduced in January 2012, Oracle's Big Data Appliance supports both Hadoop and Oracle's NoSQL database. It combines Oracle Sun server and storage hardware with Cloudera's distribution of the Apache Hadoop framework, Cloudera Manager software, Oracle NoSQL Database Community Edition, Oracle R Distribution analytics software, Oracle Linux, Oracle Hotspot Java Virtual Machine and an Oracle Enterprise Manager Plug-In.
The X3-2 Full Rack is big, packing 18 nodes, 288 processors and 216 3-TB drives for a total storage capacity of 648 TB. The price, including the software (which is mostly open source) is $450,000. But with Tuesday's announcement, Oracle is essentially offering three steps up. You can now start with the six-node, 1/3 Starter Rack, which is $160,000, and later up to two six-node, In-Rack Expansion units (also $160,000) to step up to full-rack capacity. Given that Hadoop typically stores three copies of all data, you can think of it as stepping up from 72 TB at six nodes to 144 TB at 12 nodes and 216 TB at 18 nodes with current drives.
[ Want more on Oracle's big data offerings? Read Oracle Upgrades NoSQL Database, Big Data Appliance. ]
"This is specifically focused on giving customers who are starting big data projects a smaller, less-expensive place to start," said George Lumpkin, Oracle's VP of data warehousing product management, in an interview with InformationWeek.
Granted it has only been a year since the release of the Big Data Appliance, but you don't hear much about deployments. Lumpkin declined to say how many units have been sold, but he noted that there are (unnamed) customers in the telecom, automotive, financial services, travel services and data services industries (and you get the feeling it's not much more than one company in each of these industries). Requests for references turned up one video customer testimonial, from Thomson Reuters.
Most customers are using the appliance to examine new data types, with complex and variably structured log data being typical and clickstreams and network activity logs being the most common example, according to Lumpkin. "Any organization that has a large online presence most likely has a relational data warehouse to store information about customers, products and purchases, but clickstream data lets them look at what products customers have looked at and what they've removed from a shopping cart before buying," he explained.
With its NoSQL database and analytics software, the appliance delivers more than just a Hadoop appliance. It also includes an Oracle Big Data Connector that lets you do SQL queries against data in HDFS. The tool uses Hive metadata but is not reliant on Hive-triggered MapReduce processing, according to Lumpkin. The NoSQL database, he said, is most often used to ingest data into the Hadoop cluster, to support Web-scale transaction processing or low-latency content lookups.
The primary competition to Oracle's appliance is build-your-own deployments combining Hadoop software from Cloudera, Hortonworks or MapR with commodity hardware, Lumpkin said. How does the appliance stack up on cost?
"When a customer takes into account the price of the individual components, plus the time saved on provisioning and tuning the system as well as support and upgrade effort, we come out extremely well in a cost comparison," Lumpkin said, noting that the software included with the appliance is configured and ready to run.
As for the competition with EMC and Teradata Hadoop appliances (and the just-announced IBM Hadoop appliance expected this summer), Lumpkin said Oracle stands out in being partnered with Cloudera.
"We partnered with the leading Hadoop distribution, and customers are often already using Cloudera or they are very receptive to using it," he said.
It's pretty clear that the sweet spot (and, arguably, the only spot) for Oracle's Big Data Appliance is Oracle shops, so recent upgrades, including the Oracle Enterprise Manager Plug-In, only make sense.
"Most of our customers are integrating the Big Data Appliance with Oracle Exadata, and in that environment they want one management console spanning their ecosystem," Lumpkin said, noting that Cloudera Manager is for running the Hadoop software and jobs, while Oracle Enterprise Manager oversees the hardware within the appliance.
It's not unusual for Oracle to start big and then move down. Exadata, for example, was originally started with a 1/4 rack configuration, but Oracle last year introduced a smaller 1/8 rack configuration. And Oracle's experience with Hadoop does not seem out of step with the rest of the market; a visit to the Hadoop Wiki reveals that plenty of companies using Hadoop have fewer than 10 nodes.
Oracle can technically link together as many as 18 Oracle Big Data Appliances. But it's pretty clear that the petabyte-scale Hadoop practitioners out there are building their own clusters and even building their own hardware, as in the case of Facebook, Goldman Sachs, Fidelity and other participants in the Open Compute Project.
With more than 300,000 Oracle database customers, the company doesn't have to worry about running out of prospects. But it sure could use more big data customer success stories.
When it comes to database deals, customers -- not vendors -- now have the advantage. Find out the results of our new Database Technology Survey. Also in the new, all-digital State Of Database Technology issue of InformationWeek: Oracle has refreshed its midrange and high-end Sparc servers, but that may not help its bottom line. (Free registration required.)