Oracle Big Data SQL: 5 Key Points - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
01:42 PM
Doug Henschen
Doug Henschen
Connect Directly

Oracle Big Data SQL: 5 Key Points

Oracle's new big data tool won't cover all the analysis bases, but it will enable SQL-savvy professionals to query Hadoop and NoSQL sources.

Oracle announced last week that it will open up access to Hadoop and NoSQL data with Oracle Big Data SQL, a feature to be added to the Oracle Big Data Appliance in the third quarter. The new tool has some limitations, as this article describes, but the good news is that it will enable Oracle Database shops to take better advantage of big data using existing skills and expertise.

We were on the right track last week when compared Oracle Big Data SQL to Teradata Query Grid and Microsoft PolyBase. All three technologies are about SQL querying across databases and big data platforms, and all three ultimately move data to the vendor's respective SQL database. There are differences under the hood that will make a difference for Oracle customers. We'll get to these nuances in a moment, but what's encouraging is that Oracle is not presenting this SQL tool like a hammer and all big-data-analysis challenges like nails. The idea is simply to enable SQL-trained professionals to do as much as possible with information from Hadoop and NoSQL sources from the familiar environs of Oracle Database.

[Want more on the Spark option for big data analysis? Read Databricks Spark Plans: Big Data Q&A.]

Like many Oracle customers, we watched last week's Oracle Big Data SQL launch presentation and heard about all the advantages of this feature. In a follow-up interview with Oracle executives Dan McClary, product manager, and Neil Mendelson, VP of product management, we asked about limitations and got more detail on how this feature works. We also got a frank assessment of what Oracle Big Data SQL can and can't do. For example, McClary and Mendelson were clear in saying that Oracle Big Data SQL is not a SQL-on-Hadoop tool intended to replace Hive, Impala, or other analysis options that operate exclusively on Hadoop.

Oracle Big Data SQL was used to create this geospatial correlation of Twitter sentiment data stored on Hadoop with customer profitability data managed in Oracle Database.
Oracle Big Data SQL was used to create this geospatial correlation of Twitter sentiment data stored on Hadoop with customer profitability data managed in Oracle Database.

Here, then, are five key points would-be customers should know about Oracle Big Data SQL:

1. Access is limited to Oracle's appliance, Cloudera's software, and, at first, Oracle NoSQL Database. Oracle Bid Data SQL is a feature of the Oracle Big Data Appliance, so that's the only place it can run. At this point it's not planned to be available as stand-alone software for use with Hadoop deployed on non-Oracle hardware. What's more, Oracle execs said there are no plans to make it run on any Hadoop distribution other than Cloudera -- the software bundled with the Oracle Big Data Appliance.

The feature will also be limited to working with the Oracle NoSQL Database, which is the other software component in the Oracle Big Data Appliance bundle. Here, at least, there are plans to open up access to non-Oracle products, including Cassandra, Hbase, and MongoDB.

"The Hadoop community has been very good about coming up with data storage handlers for Hive, so we'll use those to consume data from a number of other NoSQL data stores," McClary explained. This move is "at the top of our list," he said, but it will have to wait for a subsequently release of Oracle Big Data SQL.

The sooner Oracle can add support for the most popular NoSQL databases the better. Teradata Query Grid, by contrast, offers direct access to MongoDB. As for the limitation of working only with the Oracle Big Data Appliance and Cloudera software, we think Oracle should rethink this approach, as many companies have deployed Cloudera and other Hadoop distributions without using Oracle's appliance. Teradata Query Grid and Microsoft PolyBase are not limited to specific big data appliances or Hadoop distributions. Why not bundle Oracle Big Data SQL with Oracle Database instead of the Big Data Appliance?

2. Oracle Smart Scan minimizes data movement. Oracle made a virtue of necessity when it developed the Smart Scan feature for the Exadata appliance. The technology gave Oracle the power of distributed processing at a storage-tier level, boosting scalability without changing Oracle Database itself.

Smart Scan effectively prescreens data on the storage tier and brings only that which is relevant up to the database level. Oracle Big Data SQL will run Smart Scan on Hadoop using the metadata generated by Hive. Once again the feature minimizes data movement, in this case from Hadoop to Oracle database.

During Oracle's launch presentation, McClary shared the example of correlating Twitter data from Hadoop with customer transaction data in Oracle Database. Smart Scan first filtered out Tweets without discernable sentiments, eliminating more than 50% of the original data, and it then eliminated Tweets that lacked latitude and longitude information. The final subset represented less than 1% of the total Twitter stream in Hadoop, cutting data movement to Oracle Database (and thus query time) by 99%. All of this was accomplished with a single SQL query, according to McClary, and the final result was visualized with a map (shown above) pinpointing sentiment correlated with sales profitability by location.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
7/21/2014 | 3:14:32 PM
SQL and the post-MapReduce Era
There has been a lot said about moving on from MapReduce recently, but I think the big data world needs a clearer understanding of the capabilities of MapReduce, MapReduce 2.0 on YARN and alternatives. As for the Oracle Big Data SQL alternative, Oracle offered the example of correlating Twitter data with customer information, as described above, but I'm hoping to hear about many more examples of what's possible at Oracle Open World (which will, no doubt, be when this feature sees it's general release).
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Tech Spending Climbs as Digital Business Initiatives Grow
Jessica Davis, Senior Editor, Enterprise Apps,  4/22/2021
Optimizing the CIO and CFO Relationship
Mary E. Shacklett, Technology commentator and President of Transworld Data,  4/13/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll