Hadoop: From Experiment To Leading Big Data Platform
6th annual Hadoop Summit, held this week in Silicon Valley, will highlight Hadoop's evolution from backroom science project to mainstream big data manager.
Apache Hadoop has come a long way since the first Hadoop Summit took place in 2007. From its humble origins as a promising open-source framework for managing data-intensive distributed applications, Hadoop has mushroomed into the leading big data platform, one doing real work at Fortune 500 corporations.
This year's Hadoop Summit, co-sponsored by Yahoo and Hortonworks, takes place June 26-27 in San Jose, Calif. The 2-day event is expected to draw 2,500 to 3,000 attendees and will feature more than 90 breakout sessions on all things Hadoop, according to John Kreisa, vice president of strategic marketing for Hortonworks.
"I've been working with the technology for three or four years now, and over that time Hadoop has gone from the experimental, 'We've got a test cluster set up,' to 'OK, here's what we're going to do with it,'" Kreisa told InformationWeek.
The theme of this year's conference is Hadoop's "maturation," spotlighting the platform as a key component of the next generation of data architectures. "Effectively, Hadoop has matured now as a technology such that mainstream enterprises are using it for a wide variety of workloads," Kreisa said. Summit attendees will hear presentations from major corporations, including Cardinal Health, Home Depot, and Kohl's, that are using Hadoop for real workloads.
Despite Hadoop's growing popularity in the enterprise, however, it has its shortcomings, most notably a reputation for being difficult to use. There's also the problem of what to do with all that big data once you've collected it.
As InfomationWeek's Doug Henschen writes, "In contrast to NoSQL, Hadoop seems to be getting all the credit it deserves and then some. By many accounts, it's the be-all and end-all of big data, despite the fact that the lion's share of deployments today are little more than digital landfills."
Kreisa counters that "digital landfill" is an interesting analogy, but not one that represents what he's seeing in the enterprise. "The term that we hear companies using, large financial services and telecommunications (firms), is 'data lake' or 'data reservoir,'" he said, adding that these organizations are able to "spin out" new analytic applications based on the data they're collecting.
Kreisa does acknowledge, however, that Hadoop has "a few rough edges that need to be sanded off," particularly in the areas of deployment and manageability. "These things continue to evolve," he said. "Hadoop is a large distributed system with lots of moving parts. A modern Hadoop platform will have 10 or 12 open-source projects as subcomponents."
Hadoop is arguably the best-known and most widely used big data management platform, but it certainly isn't the only option for enterprises. Should its proponents be worried?
"I don't see any serious competitors to Hadoop," Kreisa said. "There are lots of other technologies that fill different workload components, and part of it comes down to the underlying file system."
He continued, "Generally speaking, HDFS, the Hadoop Distributed File System, has almost really won the battle. If you look at other architectures, where people may try to replace the query engine on top of it … HDFS is still the underlying place where that data is coming to rest."
There's still a significant need for Hadoop training, Kreisa added, which in part is what this week's Summit is all about. "There needs to be growth in skills, because again, it's a complex distributed storage system that's not like the other things that people are using today."
To understand how to secure big data, you have to understand what it is -- and what it isn't. In the Security Implications Of Big Data Strategies report, we show you how to alter your security strategy to accommodate big data -- and when not to. (Free registration required.)
Google in the Enterprise SurveyThere's no doubt Google has made headway into businesses: Just 28 percent discourage or ban use of its productivity products, and 69 percent cite Google Apps' good or excellent mobility. But progress could still stall: 59 percent of nonusers distrust the security of Google's cloud. Its data privacy is an open question, and 37 percent worry about integration.
CIOs Get Smart About BIIT’s tried for years to simplify business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
InformationWeek Tech Digest, Nov. 10, 2014Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?