Big Data // Big Data Analytics
Commentary
7/21/2014
01:42 PM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Oracle Big Data SQL: 5 Key Points

Oracle's new big data tool won't cover all the analysis bases, but it will enable SQL-savvy professionals to query Hadoop and NoSQL sources.

Oracle announced last week that it will open up access to Hadoop and NoSQL data with Oracle Big Data SQL, a feature to be added to the Oracle Big Data Appliance in the third quarter. The new tool has some limitations, as this article describes, but the good news is that it will enable Oracle Database shops to take better advantage of big data using existing skills and expertise.

We were on the right track last week when compared Oracle Big Data SQL to Teradata Query Grid and Microsoft PolyBase. All three technologies are about SQL querying across databases and big data platforms, and all three ultimately move data to the vendor's respective SQL database. There are differences under the hood that will make a difference for Oracle customers. We'll get to these nuances in a moment, but what's encouraging is that Oracle is not presenting this SQL tool like a hammer and all big-data-analysis challenges like nails. The idea is simply to enable SQL-trained professionals to do as much as possible with information from Hadoop and NoSQL sources from the familiar environs of Oracle Database.

[Want more on the Spark option for big data analysis? Read Databricks Spark Plans: Big Data Q&A.]

Like many Oracle customers, we watched last week's Oracle Big Data SQL launch presentation and heard about all the advantages of this feature. In a follow-up interview with Oracle executives Dan McClary, product manager, and Neil Mendelson, VP of product management, we asked about limitations and got more detail on how this feature works. We also got a frank assessment of what Oracle Big Data SQL can and can't do. For example, McClary and Mendelson were clear in saying that Oracle Big Data SQL is not a SQL-on-Hadoop tool intended to replace Hive, Impala, or other analysis options that operate exclusively on Hadoop.

Oracle Big Data SQL was used to create this geospatial correlation of Twitter sentiment data stored on Hadoop with customer profitability data managed in Oracle Database.
Oracle Big Data SQL was used to create this geospatial correlation of Twitter sentiment data stored on Hadoop with customer profitability data managed in Oracle Database.

Here, then, are five key points would-be customers should know about Oracle Big Data SQL:

1. Access is limited to Oracle's appliance, Cloudera's software, and, at first, Oracle NoSQL Database. Oracle Bid Data SQL is a feature of the Oracle Big Data Appliance, so that's the only place it can run. At this point it's not planned to be available as stand-alone software for use with Hadoop deployed on non-Oracle hardware. What's more, Oracle execs said there are no plans to make it run on any Hadoop distribution other than Cloudera -- the software bundled with the Oracle Big Data Appliance.

The feature will also be limited to working with the Oracle NoSQL Database, whitch is the other software component in the Oracle Big Data Appliance bundle. Here, at least, there are plans to open up access to non-Oracle products, including Cassandra, Hbase, and MongoDB.

"The Hadoop community has been very good about coming up with data storage handlers for Hive, so we'll use those to consume data from a number of other NoSQL data stores," McClary explained. This move is "at the top of our list," he said, but it will have to wait for a subsequently release of Oracle Big Data SQL.

The sooner Oracle can add support for the most popular NoSQL databases the better. Teradata Query Grid, by contrast, offers direct access to MongoDB. As for the limitation of working only with the Oracle Big Data Appliance and Cloudera software, we think Oracle should rethink this approach, as many companies have deployed Cloudera and other Hadoop distributions without using Oracle's appliance. Teradata Query Grid and Microsoft PolyBase are not limited to specific big data appliances or Hadoop distributions. Why not bundle Oracle Big Data SQL with Oracle Database instead of the Big Data Appliance?

2. Oracle Smart Scan minimizes data movement. Oracle made a virtue of necessity when it developed the Smart Scan feature for the Exadata appliance. The technology gave Oracle the power of distributed processing at a storage-tier level, boosting scalability without changing Oracle Database itself.

Smart Scan effectively prescreens data on the storage tier and brings only that which is relevant up to the database level. Oracle Big Data SQL will run Smart Scan on Hadoop using the metadata generated by Hive. Once again the feature minimizes data movement, in this case from Hadoop to Oracle database.

During Oracle's launch presentation, McClary shared the example of correlating Twitter data from Hadoop with customer transaction data in Oracle Database. Smart Scan first filtered out Tweets without discernable sentiments, eliminating more than 50% of the original data, and it then eliminated Tweets that lacked latitude and longitude information. The final subset represented less than 1% of the total Twitter stream in Hadoop, cutting data movement to Oracle Database (and thus query time) by 99%. All of this was accomplished with a single SQL query, according to McClary, and the final result was visualized with a map (shown above) pinpointing sentiment correlated with sales profitability by location.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
SachinEE
50%
50%
SachinEE,
User Rank: Ninja
7/29/2014 | 9:05:58 AM
Re: Well in
The oracle data tool will not be able to cover the entire analyses basis but the comforting thing is that its advantages are more than the disadvantages. The other good thing is that it will be able to open up access to Hadoop data in the third quarter. The fascinating thing is that it will enable the oracle data shops to be able to use bigger data with the help and their expertise. The data storage feature will also be increased meaning that the user will be able to get more space to save their data. The oracle team will only need to elaborate more on the capabilities of the map reduce and the yarn too, Plus their capabilities.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/21/2014 | 3:14:32 PM
SQL and the post-MapReduce Era
There has been a lot said about moving on from MapReduce recently, but I think the big data world needs a clearer understanding of the capabilities of MapReduce, MapReduce 2.0 on YARN and alternatives. As for the Oracle Big Data SQL alternative, Oracle offered the example of correlating Twitter data with customer information, as described above, but I'm hoping to hear about many more examples of what's possible at Oracle Open World (which will, no doubt, be when this feature sees it's general release).
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Must Reads Oct. 21, 2014
InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A roundup of the top stories and trends on InformationWeek.com
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.