The era of big data analysis is here to stay. Take your pick of 2014 proof points.
Tech watchers might cite the more than $200 million in venture capital raised by the top three NoSQL database vendors, or the $1 billion raised by the top-three Hadoop software distributors. Many took note of the recent declaration by Forrester Research that "Hadoop is no longer optional" for large enterprises, thanks to compelling "Hadooponomics" that make it a must for high-scale storage and data processing.
InformationWeek is more impressed by the testimonials of companies that are getting real value out of big data platforms and analysis techniques. Pfizer and Merck, for example, are developing more effective and affordable drugs thanks to big data techniques that are leading to more targeted treatments and more productive manufacturing processes. GE and others are demonstrating improvements in industrial equipment performance, uptime, and safety thanks to Internet of things-style applications.
[Want more on the top IT achiever of 2014? Read IT Chief Of The Year: Bank Of America's Cathy Bessant.]
And then there are the pioneers like The Weather Company and Facebook that say they just couldn't run their data-driven businesses without new platforms, even if they still have a place for more conventional tools like relational databases.
Here are five trends witnessed over the last year that point to progress in big data analysis:
1. SQL meets Hadoop
Hadoop is here to stay, so every data management vendor worth its salt must have a SQL-on-Hadoop or SQL-access-to-Hadoop option. Here are five of our most-read stories in the SQL-meets-Hadoop vein:
Just remember that SQL is not designed to find correlations among variably structured data sets. Nor does it support machine learning, many advanced analytics techniques, or other approaches often associated with big data analysis. If SQL solved everything, we wouldn't need new platforms.
2. Platforms mature
Every other week in 2014, or so it seems, Hadoop software distributors and NoSQL database vendors announced new management consoles, security systems, data management capabilities, search engines, or high-availability features. Here's a sampling of what we're talking about:
These and other big data vendors are trying to reassure enterprise IT types that these products are secure and reliable as 30-year-old database management systems. Let's just say that more than a few grizzled IT veterans are still used to working with favored and familiar tools and still need some convincing.
3. Educational options proliferate
Nature abhors a vacuum, so into the void of data science and big-data platform knowledge and expertise have rushed vendors,
headhunters, universities, MOOC providers, and others offering various forms of training to big-data-analysis wannabes. Here a sampling of related developments in 2014:
With education and training opportunities flourishing, there will no doubt be waves of new talent available by the time we meet the graduating classes of 2015, 2016, and 2017.
4. Cloud options multiply
Hadoop, NoSQL databases, analytics tools and platforms: You name the technology and businesses are likely to start experiments in the cloud. And many will stay there, having no interest in deploying servers and administering software on premises. That's certainly true of small and midsize practitioners we've met. Vendors are responding to the demand. Here are few of the notable cloud-oriented big data announcements made in 2014:
5. Focus turns to analysis
Data platforms will inevitably be commoditized. The real value in data is delivered through analysis, not just putting it all in a lake or on a data hub. Apache Spark offers a compelling promise: Machine learning, SQL, R-based analytics, graph network analysis, and streaming analysis all on one system. Support soared in 2014. Here's a sampling of related coverage this year:
Spark detractors (perhaps threatened) are whispering that Spark is too green or that niche alternatives (like Apache Storm for streaming analysis) might be better. Spark developer Databricks is responding with system tweaks and benchmark tests said to prove scalable performance.
Rest assured the real competitive battle in big data will be to lead in providing tools and capabilities for data analysis. Multiple commercial vendors (including Actian, Pivotal, and Teradata) seem to be aping the multi-analysis-engine platform strategy, and Cloudera's recent acquisition of DataPad, which offered a Python-based data analysis library, showed it's headed deeper into analytics.
Fasten your seatbelts -- it's going to be a competitive, and interesting, 2015.
Apply now for the 2015 InformationWeek Elite 100, which recognizes the most innovative users of technology to advance a company's business goals. Winners will be recognized at the InformationWeek Conference, April 27-28, 2015, at the Mandalay Bay in Las Vegas. Application period ends Jan. 16, 2015.Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio