Big data project leaders still hunger for some key technology ingredients. Starting with SQL analysis, we examine the top five wants and the people working to solve those problems.
2 of 6
Wish 1: SQL Analysis At Big-Data Scale You could compile a massive data set just by gathering all the stories and reports that have been written about the shortage of big-data talent. The most acute need is for data scientist types who know data and who also know how to write custom code, MapReduce jobs, and algorithms to gain insights from big data. But what if SQL-savvy professionals schooled in relational databases and business intelligence (BI) and analytics tools could do more of the heavy lifting? There are many more SQL professionals out there than there are data scientists, and most SQL pros would be eager to expand their career potential.
There's a big push to deliver SQL-analysis capabilities on top of Hadoop, and the talent shortage is just one reason. The second reason for the trend is that Apache Hive, Hadoop's incumbent data warehousing infrastructure, offers a limited subset of SQL-like query capabilities and suffers from slow performance tied to behind-the-scenes MapReduce processing.
Answering the call for broader, faster SQL querying on Hadoop are projects and initiatives including Cloudera Impala, EMC's HAWQ query feature on the Pivotal HD distribution, Hortonworks Stinger, IBM Big SQL, MapR-supported Apache Drill, and Teradata SQL-H.
Even the NoSQL camp wants better, SQL-like querying. Last year 10Gen added a real-time data aggregation framework to its popular MongoDB NoSQL database. The aggregation framework lets users directly query data within MongoDB without resorting to writing and running complicated, batch-oriented MapReduce jobs. More evidence is Acunu, which has developed a SQL-like AQL language to support querying on top of Cassandra.
The development of SQL querying capabilities is only the beginning. BI and analytics tools and systems native to big-data platforms are emerging. Examples include Datameer, Hadapt, Karmasphere and Platfora, and they're offering distinguishing query, analysis, data-visualization and monitoring capabilities on top of Hadoop.
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.