informa
/
Commentary

Big Data's 2 Big Years

During the past two years, Hadoop hit the big time, in-memory arrived, and practitioners saw data analysis deliver real results.

Two years might seem like a short time to be measuring shifts in technology, but in the fast-moving big-data arena, a lot has changed.

Just 26 months ago we published a collection of "12 Top Big Data Analytics Players," but changing times and technologies demanded this update: "16 Top Big Data Analytics Platforms." It's a fresh look at the data-management vendors offering the database management systems (DBMSs) and Hadoop platforms underpinning big-data analytics. (We did not include focused analytics vendors, such as Alpine Data Labs, Revolution Analytics, and SAS, nor NoSQL and NewSQL DBMS vendors, such as Couchbase, DataStax, and MongoDB, which deserve separate treatment.)

So what has changed in these two short years? Here are the three big factors.

1. Vendors expect Hadoop to be in the mix.
Practically every vendor out there has embraced Hadoop, going well beyond the fledgling announcements and primitive "connectors" that were prevalent two years ago. Industry heavyweights IBM, Microsoft, Oracle, Pivotal, SAP, and Teradata are all selling and supporting Hadoop distributions -- partnering, in some cases, with Cloudera and Hortonworks. Four of these six have vendor-specific distributions, Hadoop appliances, or both.

[Want more on the top big data analytics vendors? Read "16 Top Big Data Analytics Platforms."]

Traditionalists complain that Hadoop remains a slow, primitive, and disparate collection of systems mired in iterative, hard-to-manage, hard-to-code MapReduce processing. But 2013 brought a Hadoop 2.0 release that promises easier management of myriad workload types, extending beyond MapReduce to improved SQL querying, graph analysis, and stream processing. In fact, SQL-on-Hadoop products and projects have exploded over the last year, and vendor options now range from Cloudera Impala, IBM BigSQL, Apache Drill, and a higher-performance Hive. They also include Pivotal HAWQ and InfiniDB engines running on HDFS, Polybase data exploration in Microsoft SQL Server, and HP Vertica and Teradata SQL-H exploration of HDFS with help from HCatalog.

2. Low-latency expectations are on the rise.
With steady improvements in processing power and declining costs for performance, in-memory and even streaming processing speeds are increasingly in demand. SAP has been the most prominent champion here with its Hana platform, but IBM has introduced BLU Acceleration for DB2, and Microsoft and Oracle are preparing in-memory options for their flagship databases. Among data warehousing and data mart specialists, Teradata and others also are making the most of RAM by offering options for high RAM-to-disk ratios and providing ways -- automated in Teradata's case -- to pin the most-queried data into RAM.

In the Hadoop arena, projects such as Spark and Storm are pursuing in-memory and streaming performance at high scale for breakthrough applications in ad delivery, content personalization, and mobile geo-location services. 

3. Practitioners get the big-data religion.
The third -- and most important -- change in the big-data arena over the last two years has been in the awareness of practitioners. Tech buyers have opened their eyes to the falling cost of storing and using data and to the tremendous opportunities for them to make use of this information. Here are examples I've heard of in the past few days:

  • A payroll and benefits-management company that depends on processing fees now recognizes that it's sitting on a trove of data on hiring, salary, career, and economic trends, and analyzing that could become a new revenue source.
  • A WiFi infrastructure provider to retailers lives on razor-thin margins amid plenty of competition, so it's looking into ways to give retailers insight on the customers tapping into the hot spots. Such analysis could provide insight on how long people linger in a store, which could be combined with customer profile data and reveal whether customer profiles and traffic patterns are changing by store location.
  • A car manufacturer is streaming auto-performance data from satellite-connected vehicles so it can better understand service, maintenance, and warranty trends by model, with the potential to trigger proactive service recommendations.
  • An agricultural retailer is selling data on what seeds customers are buying and planting by store location.

We're also hearing a lot of hype and inflated claims around big data and, yes, the term itself has its flaws. But where there's smoke there's fire. We've recently reported on companies, including Amadeus, MetLife, Paytronix, and The Weather Company, that are seeing big returns on their big-data investments. Over the next two years we expect many of today's fledgling big-data projects to lead to very real and repeatable business successes. As always, we're ready to share the success stories and dissect the disasters.

Too many companies treat digital and mobile strategies as pet projects. Here are four ideas to shake up your company. Also in the Digital Disruption issue of InformationWeek: Six enduring truths about selecting enterprise software. (Free registration required.)