We analyze the important news from SAS, Hortonworks, MetaScale, and others at the Strata conference, as big data seeks a productive next chapter.
O'Reilly's Strata 2014 conference is in full swing in Santa Clara, Calif., this week, and show organizers are turning a page with the conspicuous absence of the term "big data" from the major themes and conference tracks. It's another sign that people are ready to go beyond the comic book version of what's happening with data.
"Making Data Work" is the aspirational theme of this year's conference, and the tracks promise a more nuanced novella with topics including "Connected World" (Internet of Things), "Data in Action" (real-world case studies), "Data Science" (skills, techniques, and strategies), "Ethics, Policy, and Privacy" (can we actually do anything about these?), "Design" (data-visualization and interfaces), and "Hadoop and Beyond" (tools and technologies).
Many vendors making announcements at Strata have yet to pick up on the emphasis on productivity over hyperbole. The big-data buzz talk seems to be ladled into press releases in inverse proportion to what can be stated about specific capabilities and, more importantly, named customers citing real-world business benefits.
We'll skip the news here, therefore, about venture capital rounds and stealth companies and focus instead on nine more notable announcements from Strata in three categories:
Analytics at Scale
SAS In-Memory Statistics for Hadoop: SAS has progressed from an Access connector to Hadoop to delivering SAS Visual Analytics and SAS High-Performance Analytics products capable of running on Hadoop. The new news this week is SAS In-Memory Statistics For Hadoop, which takes advantage of the vendor's capabilities to perform data analysis on high-scale, in-memory clusters.
SAS In-Memory Statistics For Hadoop, to be released in the first half of this year, will enable multiple users to "simultaneously and interactively manage, explore, and analyze data, build and compare models, and score massive amounts of data in Hadoop." Selected data from Hadoop is loaded into memory once for iterative analysis across multiple users, avoiding time-consuming rounds or writing to and reading from disk.
SAS also promises to eliminate "a patchwork of tools" and "the need for different analytic programming languages," but this hints at a SAS-only world that might not go down well with open-source-minded Hadoop fans. Analysis options are said to include clustering, regression, generalized linear models, analysis of variance, decision trees, random decision forests, text analytics, and recommendation systems. We're anxious to see how open this world might be and how it combines a memory cluster with a Hadoop cluster (or could they possibly be one and the same)?
The idea behind Chorus is to break down complex, iterative analytics workflows into discrete, understandable steps that can be shared with and controlled by business users. The goal is to eliminate the time-consuming back-and-forth between business users who know what they want and data wonks who were previously the only ones who could deliver results. Havas Media, the beta customer we interviewed in our report, said it gives business users and data analysts a shared workflow and "a common language" for analytic exploration. Chorus can do its distributed "in-cluster" work on top of Hadoop if you choose, avoiding data movement from your high-scale data store.
Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.