Oracle's Cloudera-powered Hadoop box gains new authentication, audit and query options courtesy of the Oracle software stack.
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)
It goes without saying that a new Oracle Big Data Appliance announced on Tuesday features CPU and storage upgrades. But the real story is in the software, Oracle said. The list of upgrades includes Kerberos authentication, Oracle Audit Vault-integrated tracking, and new R- and XQuery-based data-analysis options.
The Big Data Appliance is one of Oracle's so-called engineered systems that preconfigures software and hardware so you can plug it in, set a few settings, load data, and start working. The software includes Cloudera's distribution of Hadoop and the Oracle NoSQL database, while the hardware bundles servers, storage, and networking capacity that ties everything together.
The latest version of this package, the Oracle Big Data Appliance X4-2, moves up to Intel's latest, more powerful x86 processors. Oracle also switched from 3-TB to 4-TB hard drives and somehow eked out a 33% increase in total storage capacity to 864 TB for a full rack. You can start with 1/3-rack and scale up in 1/3-rack increments, adding additional racks as needed.
On the software side Oracle said the Big Data Appliance includes Cloudera's entire software stack, which has been greatly expanded since the last major Big Data Appliance release. In addition to Cloudera's CDH distribution of open source Hadoop, the bundle includes Cloudera Manager (deployment and admin.), Cloudera Impala (SQL query), Cloudera Search, HBase, Cloudera Backup and Disaster Recovery, and Cloudera Navigator (auditing and access management).
Beyond the engineered system, George Lumpkin, Oracle's VP of data warehousing technologies, said the company is thinking about the larger architecture for big data. As evidence he points to a few tweaks that Oracle has added into the mix. These include new Kerberos authentication controls for the appliance and integrations between Hadoop and the Oracle Audit Vault & Database Firewall product. The Audit Vault tie will enable customers who have this product to audit Oracle databases and the Big Data Appliance from a single console.
Oracle also announced that it's also working with Cloudera on its open source Sentry project to develop additional layers of security for Hadoop. "So we're tackling the authentication problem with Kerberos, we're tackling the auditing problem with Audit Vault and we're tackling access-control for Hadoop by working with the open source community," Lumpkin told InformationWeek.
On the query front, Oracle has added XQuery for Hadoop, technology borrowed from Oracle Database that supports querying of XML documents and JSON documents in the Hadoop Distributed File System (HDFS). The company has also enhanced its R Connector for Hadoop, which is designed to apply R language analytics in a scalable, distributed way to data in HDFS. The enhancements cover additional statistical methods including generalized linear models and factor-analysis algorithms.
You could argue that Oracle rivals IBM, Microsoft, and EMC-spinoff Pivotal have gone to much greater lengths to support big-data analysis by developing their own Hadoop distributions and software. IBM's BigInsights, for instance, includes IBM-developed query tools including BigSheets and BigSQL. Pivotal has also developed new query options for the platform with its HAWQ SQL-on-Hadoop interface. Microsoft collaborated with Hortonworks to create a Windows-compatible Hadoop distribution that's compatible with Microsoft Systems Center and Active Directory.
What rivals don't tend to do is publish clear-cut pricing, even in cases where they offer appliances. The Oracle Big Data Appliance X4-2, which is available immediately, is $525,000 for a full rack, including Oracle Linux, Oracle Java VM, the complete Cloudera stack and the Oracle NoSQL Database Community Edition. That price includes the first year of support for Cloudera's software, with first-line support provided by Oracle. Support after the first year is covered by a 12% annual maintenance fee (about $63,000). Support for NoSQL Community Edition or an upgrade to Enterprise Edition is charged separately.
Where Cloudera's recent "Enterprise Data Hub" vision is concerned, Oracle has a slightly different view on where things are headed. Cloudera now casts Hadoop as the first place that companies will store all data -- including the full detail and extended history of customer and transactional information. Lumpkin said Hadoop is more likely store new data types including clickstreams, log files, and sensor and social data.
"The enterprise data warehouse is and will continue to be the primary store of customer data, sales transaction data, and product data," said Lumkin. "The Big Data Appliance provides a repository for many of the newer, less-conventional types of data that haven't traditionally been the domain of the data warehouse."
Oracle declined to comment on the number of Big Data Appliances deployed or the split of those using Hadoop versus NoSQL or both.
IT leaders must know the trade-offs they face to get NoSQL's scalability, flexibility and cost savings. Also in the When NoSQL Makes Sense issue of InformationWeek: Oregon's experience building an Obamacare exchange. (Free registration required.)