Hadoop: From Experiment To Leading Big Data Platform - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Hadoop: From Experiment To Leading Big Data Platform

6th annual Hadoop Summit, held this week in Silicon Valley, will highlight Hadoop's evolution from backroom science project to mainstream big data manager.

Apache Hadoop has come a long way since the first Hadoop Summit took place in 2007. From its humble origins as a promising open-source framework for managing data-intensive distributed applications, Hadoop has mushroomed into the leading big data platform, one doing real work at Fortune 500 corporations.

This year's Hadoop Summit, co-sponsored by Yahoo and Hortonworks, takes place June 26-27 in San Jose, Calif. The 2-day event is expected to draw 2,500 to 3,000 attendees and will feature more than 90 breakout sessions on all things Hadoop, according to John Kreisa, vice president of strategic marketing for Hortonworks.

"I've been working with the technology for three or four years now, and over that time Hadoop has gone from the experimental, 'We've got a test cluster set up,' to 'OK, here's what we're going to do with it,'" Kreisa told InformationWeek.

[ Cray puts Hadoop on its supercomputers. Read Cray Brings Hadoop To High-Performance Computing. ]

The theme of this year's conference is Hadoop's "maturation," spotlighting the platform as a key component of the next generation of data architectures. "Effectively, Hadoop has matured now as a technology such that mainstream enterprises are using it for a wide variety of workloads," Kreisa said. Summit attendees will hear presentations from major corporations, including Cardinal Health, Home Depot, and Kohl's, that are using Hadoop for real workloads.

Despite Hadoop's growing popularity in the enterprise, however, it has its shortcomings, most notably a reputation for being difficult to use. There's also the problem of what to do with all that big data once you've collected it.

As InfomationWeek's Doug Henschen writes, "In contrast to NoSQL, Hadoop seems to be getting all the credit it deserves and then some. By many accounts, it's the be-all and end-all of big data, despite the fact that the lion's share of deployments today are little more than digital landfills."

Kreisa counters that "digital landfill" is an interesting analogy, but not one that represents what he's seeing in the enterprise. "The term that we hear companies using, large financial services and telecommunications (firms), is 'data lake' or 'data reservoir,'" he said, adding that these organizations are able to "spin out" new analytic applications based on the data they're collecting.

Kreisa does acknowledge, however, that Hadoop has "a few rough edges that need to be sanded off," particularly in the areas of deployment and manageability. "These things continue to evolve," he said. "Hadoop is a large distributed system with lots of moving parts. A modern Hadoop platform will have 10 or 12 open-source projects as subcomponents."

Hadoop is arguably the best-known and most widely used big data management platform, but it certainly isn't the only option for enterprises. Should its proponents be worried?

"I don't see any serious competitors to Hadoop," Kreisa said. "There are lots of other technologies that fill different workload components, and part of it comes down to the underlying file system."

He continued, "Generally speaking, HDFS, the Hadoop Distributed File System, has almost really won the battle. If you look at other architectures, where people may try to replace the query engine on top of it … HDFS is still the underlying place where that data is coming to rest."

There's still a significant need for Hadoop training, Kreisa added, which in part is what this week's Summit is all about. "There needs to be growth in skills, because again, it's a complex distributed storage system that's not like the other things that people are using today."

To understand how to secure big data, you have to understand what it is -- and what it isn't. In the Security Implications Of Big Data Strategies report, we show you how to alter your security strategy to accommodate big data -- and when not to. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
Preparing for the Upcoming Quantum Computing Revolution
John Edwards, Technology Journalist & Author,  6/3/2021
How SolarWinds Changed Cybersecurity Leadership's Priorities
Jessica Davis, Senior Editor, Enterprise Apps,  5/26/2021
White Papers
Register for InformationWeek Newsletters
2021 State of ITOps and SecOps Report
2021 State of ITOps and SecOps Report
This new report from InformationWeek explores what we've learned over the past year, critical trends around ITOps and SecOps, and where leaders are focusing their time and efforts to support a growing digital economy. Download it today!
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll