New Oracle Advanced Analytics package speeds and scales R-language-based analyses inside the 11g database.
12 Top Big Data Analytics Players
(click image for larger view and for slideshow)
Oracle stepped up its big data analysis capabilities on Wednesday with the release of Oracle Advanced Analytics, a blending of two existing Oracle products with added features that let statistical analysts execute open source R analytics inside the widely used Oracle 11g database.
The Advanced Analytics package unites the venerable Oracle Data Mining product with Oracle R Enterprise, the latter being an Oracle release of the open source R analytics environment that was introduced last month as part of the software bundle included with the Oracle Big Data appliance.
"We view analytics as a pillar of big data, and with the Advanced Analytics option we'll enable users to run statistical computations, mathematical calculations, and predictive modeling in a more scalable fashion," said George Lumpkin, Oracle's vice president of data warehousing product management in an interview with InformationWeek.
Oracle Data Mining and Oracle R Enterprise will continue to be offered separately, but the Advanced Analytics package integrates the two products, enabling the huge community of more than two million R programmers to run their analyses inside Oracle database.
Analytics software is typically used on high-performance laptops, workstations, and dedicated servers. But as data volumes have grown, in-database processing techniques are gaining adoption because they eliminate time-consuming data-movement steps and take advantage of the scalability and processing power of the databases and hardware behind data marts and data warehouses.
Oracle Data Mining offers more than a dozen SQL-based analyses that can be performed inside 11g. The Advanced Analytics package opens up the entire R language-based universe of more than 3,000 community-developed analytical applications and more than 4,000 user-created packages available with specialized statistical techniques, graphical devices, import/export capabilities, and reporting tools.
What's more, with the recent release of the Big Data appliance, Oracle customers now have two options for taking advantage of R.
"We've given the statistician the ability to use the R client that they've always used to write and develop code that can be executed not just on data stored in memory on their laptop, but also on tables and views stored on 11g or on data in the Big Data Appliance," said Lumpkin.
The Advanced Analytics package will help Oracle catch up with competitors, including EMC Greenplum, IBM, Teradata, and SAS, that have been pushing the in-database processing envelope. ECM and Teradata, for instance, are partnered with SAS on its in-database SAS High-Performance Computing platform. IBM, too, has multiple in-database options and partnerships for its DB2 and Netezza databases.
SAS in particular has built out industry specific analytic offerings such as pricing, risk and demand-planning apps that run inside partner databases. Oracle Advanced Analytics is more of an infrastructure starting point upon which organizations will be able to build out purpose-built analytic apps or take advantage of existing R applications.
"I don't see Advanced Analytics as directly competing with SAS so much as appealing to those who have decided that they're going to use R," said IDC analyst Carl Olofson in an interview with InformationWeek.
Oracle Advanced Analytics is an optional add-on to 11g. Since the R components are open source, the package is offered at the same price as Oracle Data Mining on its own: $23,000 per processor. The option can be used whether the database is deployed in a conventional server or RAC environment or on the Oracle Exadata Appliance.
Oracle claimed in a statement that R will run 100 times faster in the Oracle database than in a customer's "current environment," meaning laptops or dedicated analytic servers.
The real question is how performance on Exadata will stack up against competitors like EMC Greenplum, IBM Netezza and Teradata that employ massively parallel processing (MPP) architectures. Exadata uses a separate storage-tier to handle base table processing and then returns small result sets back up to the database for analysis. Comparative performance will depend on the type of problem you're trying to solve, according to Olofson.
"If you're doing queries that are broad and unpredictable, where the data required might be spread across the entire database, an MPP solution should work better," Olofson said. "If most of the work centers on certain combinations of data, then you could argue that Oracle's approach should work at least as well if not better than MPP."
More than 700 IT pros gave us an earful on database licensing, performance, NoSQL, and more. That story and more--including a look at transitioning to Win 8--in the new all-digital Database Discontent issue of InformationWeek. (Free registration required.)
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.