5 Top Wishes For Big Data Deployments
If you've even experimented with building big-data applications or analyses, you're probably acutely aware that the domain has its share of missing ingredients. We've boiled it down to five top wants on the big-data wish list, starting with SQL (or at least SQL-like) analysis options and shortcuts to deployment and advanced analytics and finishing with real-time and network analysis options.
The good news is that people and, in some cases, entire communities, are working on these problems. There are armies of data-management and data-analysis professionals who are familiar with SQL, for example, so organizations naturally want to take advantage of knowledge of that query language to make sense of data in Hadoop clusters and NoSQL databases -- the latter is no paradox, as the "No" in "NoSQL" stands for "not only" SQL. It's not a surprise that every distributor of Apache Hadoop software has proposed, is testing, and has or will soon release an option for SQL or SQL-like analysis of data residing on Hadoop clusters. That group includes Cloudera, EMC, Hortonworks, IBM, MapR and Teradata, among others. In the NoSQL camp, 10Gen has improved on the analytics capabilities within MongoDB, and commercial vendor Acunu does the same for Cassandra.
Deploying and managing Hadoop clusters and NoSQL databases is a new experience for most IT organizations, but it seems that each and every software update brings new deployment and management features expressly designed to make life easier. There are also a number of appliances -- available or planned by the likes of EMC, HP, IBM, Oracle and Teradata -- aimed at fast deployment of Hadoop. Other vendors are focusing on particularly tricky aspects of working with Hadoop framework components. WibiData, for example, provides open-source libraries, models and tools designed to make it easier to work with HBase, Hadoop's high-scale NoSQL database.
The whole point of gathering up and making use of big data is to come up with predictions and other advanced analytics that can trigger better-informed business decisions. But with the shortage of data-savvy talent in the world, companies are looking for an easier way to support sophisticated analyses. Machine learning is one technique that many vendors and companies are investigating because it relies on data and compute power, rather than human expertise, to spot customer behaviors and other patterns hidden in data.
One of the key "Vs" of big data (along with volume and variety) is velocity, but you'd be hard pressed to apply the phrase "real-time" to Hadoop, with its batchy MapReduce analysis approach. Alternative software distributor MapR and analytics vendor HStreaming are among a small group of firms bringing real-time analysis of data in Hadoop. It's an essential step that other vendors -- particularly event-stream processing vendors -- are likely to follow.
Last among the top five wishes for big data is easier network analysis. Here, corporate-friendly graph-analysis databases and tools are emerging that employ some of the same techniques Facebook uses at truly massive scale. Keep in mind that few of the tools and technologies described here have had 30 or more years to mature, like relational databases and SQL query tools have. But there are clear signs that the pain points of big-data management and big-data analysis are rapidly being addressed.