Will 2015 Be The 'Year Of Hadoop'? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Software Platforms
News
6/10/2015
12:06 PM
Connect Directly
Twitter
RSS
E-Mail
100%
0%

Will 2015 Be The 'Year Of Hadoop'?

Hadoop advocates say analysis of unstructured data yields predictive analytics useful to healthcare, business, and the Internet of Things machine maintenance.

IT Hiring, Budgets In 2015: 7 Telling Stats
IT Hiring, Budgets In 2015: 7 Telling Stats
(Click image for larger view and slideshow.)

Hadoop's role has transformed from where it excelled when it first burst upon the scene. Today it's less the batch processing engine for big data in bulk and more the platform on which many refined data skills may be displayed. Using it in all its capacities will prove "transformational to companies," said Rob Bearden, CEO of Hortonworks.

He was backed up by speakers from Microsoft, Forrester Research, and Enterprise Technology Research as the Hadoop Summit got underway Tuesday in San Jose, Calif. The conference runs through Thursday. Hadoop is making it possible to gather many dissimilar types of data and use them together in analytical processes that have been difficult or impossible to do in the past, the speakers said.

For one Hortonworks customer, Hadoop serves as the collection and analysis engine for the self-reported information from a fleet of trucks. A major trucking firm set up a system to analyze lead acid battery performance and alert maintenance when any dipped to 15% capacity. Replacing a battery at that point eliminates most of the risk that a dead battery will cause a failed start on time or failure to deliver goods due, said Bearden. Being able to monitor battery operations saved the company (which Bearden said he couldn't identify) $10 million in the first year the practice was implemented, he said.

(Image: matdesign24/iStockphoto)

(Image: matdesign24/iStockphoto)

Bearden offered the battery anecdote in the opening keynote to 4,000-plus attendees at the eighth annual gathering of Hadoop users, developers, and enthusiasts. The first, in 2007, was attended by 150 people, he recalled.

[Want to learn more about Hadoop? See Hortonworks Deploys Hadoop Into Public Clouds.]

Hadoop started out as a distributed file system, HDFS, governed by MapReduce, which determined how data could best be processed by CPUs either on or close to the cluster node where the data was stored. Today, thanks to the Apache Software Foundation's Hadoop-related projects (such as Spark), Hadoop also serves as a processor of real-time streaming data.

The result will be data engines that don't much resemble relational databases or much of anything that's come before. Thomas Davenport, writing in the Wall Street Journal's CIO Report June 3, said:

This new architecture involves not only Hadoop, but an entire series of new technologies. What they have in common is that many are open source, accommodate a wide variety of data types and structures, run on commodity hardware, [and] are somewhat challenging to manage.

"The shift is playing out in real time," said Bearden, quoting from the Davenport piece. The Hadoop open source software stack will unlock the relevant and valuable customer data needed for an interaction "before there's been a transaction," he said.

Predictive Results

Collecting data on truck batteries and other components is just one aspect of what can be accomplished with the approaching Internet of Things, but it will require data engines capable of handling massive amounts of data. For example, Hadoop and the systems built atop it, like Apache's HBase, which uses the Hadoop Distributed File System to hunt for and sort small, valuable sets of data within a much larger collection, can easily identify and alert companies about mechanical parts about to fail. Proper maintenance of expensive machines -- such as airplanes, locomotives, and wind turbines -- will take on a different meaning, one where the machines are not allowed to fail under most operating conditions, rather than fixed after they fail, Bearden said.

Using the Hortonworks Data Platform, a version of Hadoop, United Healthcare tapped into a variety of unstructured customer data to predict which patients were most likely to fail to take their medications. A patient with diabetes who doesn't take medications as directed is more likely to end up with complications, leading to a 37% increase in the cost of care, compared to one who takes medicines as directed and avoids complications. The ability to predict some patient behaviors is in its infancy, but it has the prospect of disrupting and transforming how healthcare is delivered, Bearden said.

"This entire paradigm (of predictive results) is in its infancy," he added.

Microsoft's T. K. Rengargan, corporate VP for the Azure Data Platform, came to the stage to show a map of the radiation found in the area around the Fukushima reactor in Japan after it experienced a meltdown following a tsunami in March 2011. No one knew how to measure the radioactivity until a local agency gave 500 people Geiger counters and told them to upload readings to Microsoft's HDInsight data platform, based on Hortonworks' version of Hadoop. They did so and, for the first time, a map of the varying levels of radiation was drawn for miles around the meltdown.

"What an amazing use of data that could change people's lives," he remarked.

Microsoft's HDInsight includes implementations of Apache Storm, Apache HBase, the Pig language for building Hadoop applications, Apache Hive data warehouse, Apache Sqoop for transferring data from relational databases into Hadoop, Apache Oozie workload scheduler, and Apache Ambari, an operational framework for managing Hadoop clusters.

Thomas DelVecchio, founder and director of Enterprise Technology Research, took the stage to declare: "2015 is the year that Hadoop open source took off. There's no better way to invest your spending priorities."

Michael Gualtieri, Forrester Research analyst, told the attendees: "If you're not an expert in predictive analytics, you need to get there." Eventually, 100% of all large enterprises will adopt some form of Hadoop, he predicted.

Actual uptake of Hadoop trails these predictions. Gartner estimated that 11% of large enterprises will invest in Hadoop in the next 12 months, another 7% in 24 months, while 26% have deployed it, experimented with it, or have a pilot project underway. Those figures, however, may change as the value of predictive analytics becomes more widely understood.

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/10/2015 | 6:33:53 PM
Is Hadoop correctly called a data "operating system"?
Hadoop is no longer just a big data collector and sorting system. It may have been that initially, and a highly scalable one. But now it is sometimes referred to as a data platform or a data management operating system. Many different applications may run on it, both SQL and non-SQL types of query systems.
Slideshows
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Commentary
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
News
What's Next: AI and Data Trends for 2020 and Beyond
Jessica Davis, Senior Editor, Enterprise Apps,  12/30/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll