Are we rising to the peak of the big data hype cycle, or are we headed into the trough of disillusionment?
Your position on that Gartner curve depends on your own company's progress. Has your company identified any use cases for big data analytics? Have you kicked the tires on new platforms such as Hadoop? If you've gotten this far, it's a good bet you've also developed a wish list of big data capabilities or of problems you've yet to solve. It's this wish list that stands between just storing a pile of useless information and unlocking valuable business insights.
The techniques discussed here -- distributed computing, stream processing, machine learning, graph analysis -- promise to increase analytics performance, affordability and accessibility. With distributed computing and stream processing, companies are taking on analytics work that demands unprecedented scale and speed -- like a bank sizing up every bit of data it has on a customer in a split second in order to serve more relevant ads on a website. We're seeing machine learning taking on complex analyses. For example, Memorial Sloan-Kettering Cancer Center is experimenting with machine learning to continually monitor medical literature and offer cancer treatment suggestions to supplement doctors' assessments.
And we're witnessing the emergence of open source technologies, including Apache Hadoop and R, that let companies use larger and more diverse data types, and apply them to new business analysis problems. Mutual fund company American Century, for example, is writing its own R-based models that use graph analysis techniques to map connections among companies -- much like Facebook studies connections among people -- to improve its forecasts of financial results.
At this point, IT's wish list for the next-generation analytics market is long. Most companies still want to see proven analytical tools and methods rather than beta-stage projects. They want easy and familiar SQL or SQL-style analysis, not limited query capabilities and batchy, far-from-real-time performance. The piles of data keep growing, and the variety of data sources companies want to make sense of keeps expanding. Meantime, analytics startups are trying to address the shortcomings of emerging big data platforms such as Hadoop. So what follows is but an interim report on the latest and most-promising efforts to make sense of the data.
Open Source Filling The Gaps
Apache Hadoop, the distributed data processing framework now synonymous with big data, is widely accepted as a platform for building high-scale, distributed computing applications. Hadoop lets organizations store huge volumes and varieties of data quickly without all the management work demanded by relational databases. Still to be worked out, however, are the best use cases and techniques for running analytics on top of Hadoop.
download the March 25, 2013, issue of InformationWeek.