Oracle Big Data SQL: 5 Key Points - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
Commentary
7/21/2014
01:42 PM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
50%
50%

Oracle Big Data SQL: 5 Key Points

Oracle's new big data tool won't cover all the analysis bases, but it will enable SQL-savvy professionals to query Hadoop and NoSQL sources.

Oracle announced last week that it will open up access to Hadoop and NoSQL data with Oracle Big Data SQL, a feature to be added to the Oracle Big Data Appliance in the third quarter. The new tool has some limitations, as this article describes, but the good news is that it will enable Oracle Database shops to take better advantage of big data using existing skills and expertise.

We were on the right track last week when compared Oracle Big Data SQL to Teradata Query Grid and Microsoft PolyBase. All three technologies are about SQL querying across databases and big data platforms, and all three ultimately move data to the vendor's respective SQL database. There are differences under the hood that will make a difference for Oracle customers. We'll get to these nuances in a moment, but what's encouraging is that Oracle is not presenting this SQL tool like a hammer and all big-data-analysis challenges like nails. The idea is simply to enable SQL-trained professionals to do as much as possible with information from Hadoop and NoSQL sources from the familiar environs of Oracle Database.

[Want more on the Spark option for big data analysis? Read Databricks Spark Plans: Big Data Q&A.]

Like many Oracle customers, we watched last week's Oracle Big Data SQL launch presentation and heard about all the advantages of this feature. In a follow-up interview with Oracle executives Dan McClary, product manager, and Neil Mendelson, VP of product management, we asked about limitations and got more detail on how this feature works. We also got a frank assessment of what Oracle Big Data SQL can and can't do. For example, McClary and Mendelson were clear in saying that Oracle Big Data SQL is not a SQL-on-Hadoop tool intended to replace Hive, Impala, or other analysis options that operate exclusively on Hadoop.

Oracle Big Data SQL was used to create this geospatial correlation of Twitter sentiment data stored on Hadoop with customer profitability data managed in Oracle Database.
Oracle Big Data SQL was used to create this geospatial correlation of Twitter sentiment data stored on Hadoop with customer profitability data managed in Oracle Database.

Here, then, are five key points would-be customers should know about Oracle Big Data SQL:

1. Access is limited to Oracle's appliance, Cloudera's software, and, at first, Oracle NoSQL Database. Oracle Bid Data SQL is a feature of the Oracle Big Data Appliance, so that's the only place it can run. At this point it's not planned to be available as stand-alone software for use with Hadoop deployed on non-Oracle hardware. What's more, Oracle execs said there are no plans to make it run on any Hadoop distribution other than Cloudera -- the software bundled with the Oracle Big Data Appliance.

The feature will also be limited to working with the Oracle NoSQL Database, which is the other software component in the Oracle Big Data Appliance bundle. Here, at least, there are plans to open up access to non-Oracle products, including Cassandra, Hbase, and MongoDB.

"The Hadoop community has been very good about coming up with data storage handlers for Hive, so we'll use those to consume data from a number of other NoSQL data stores," McClary explained. This move is "at the top of our list," he said, but it will have to wait for a subsequently release of Oracle Big Data SQL.

The sooner Oracle can add support for the most popular NoSQL databases the better. Teradata Query Grid, by contrast, offers direct access to MongoDB. As for the limitation of working only with the Oracle Big Data Appliance and Cloudera software, we think Oracle should rethink this approach, as many companies have deployed Cloudera and other Hadoop distributions without using Oracle's appliance. Teradata Query Grid and Microsoft PolyBase are not limited to specific big data appliances or Hadoop distributions. Why not bundle Oracle Big Data SQL with Oracle Database instead of the Big Data Appliance?

2. Oracle Smart Scan minimizes data movement. Oracle made a virtue of necessity when it developed the Smart Scan feature for the Exadata appliance. The technology gave Oracle the power of distributed processing at a storage-tier level, boosting scalability without changing Oracle Database itself.

Smart Scan effectively prescreens data on the storage tier and brings only that which is relevant up to the database level. Oracle Big Data SQL will run Smart Scan on Hadoop using the metadata generated by Hive. Once again the feature minimizes data movement, in this case from Hadoop to Oracle database.

During Oracle's launch presentation, McClary shared the example of correlating Twitter data from Hadoop with customer transaction data in Oracle Database. Smart Scan first filtered out Tweets without discernable sentiments, eliminating more than 50% of the original data, and it then eliminated Tweets that lacked latitude and longitude information. The final subset represented less than 1% of the total Twitter stream in Hadoop, cutting data movement to Oracle Database (and thus query time) by 99%. All of this was accomplished with a single SQL query, according to McClary, and the final result was visualized with a map (shown above) pinpointing sentiment correlated with sales profitability by location.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
SachinEE
50%
50%
SachinEE,
User Rank: Ninja
7/29/2014 | 9:05:58 AM
Re: Well in
The oracle data tool will not be able to cover the entire analyses basis but the comforting thing is that its advantages are more than the disadvantages. The other good thing is that it will be able to open up access to Hadoop data in the third quarter. The fascinating thing is that it will enable the oracle data shops to be able to use bigger data with the help and their expertise. The data storage feature will also be increased meaning that the user will be able to get more space to save their data. The oracle team will only need to elaborate more on the capabilities of the map reduce and the yarn too, Plus their capabilities.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/21/2014 | 3:14:32 PM
SQL and the post-MapReduce Era
There has been a lot said about moving on from MapReduce recently, but I think the big data world needs a clearer understanding of the capabilities of MapReduce, MapReduce 2.0 on YARN and alternatives. As for the Oracle Big Data SQL alternative, Oracle offered the example of correlating Twitter data with customer information, as described above, but I'm hoping to hear about many more examples of what's possible at Oracle Open World (which will, no doubt, be when this feature sees it's general release).
Slideshows
Reflections on Tech in 2019
James M. Connolly, Editorial Director, InformationWeek and Network Computing,  12/9/2019
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll