November 19, 2013
Big data is a popular topic these days, not only in the tech media, but also among mainstream news outlets. And October's official release of big data software framework Hadoop 2.0 is generating even more media buzz.
But while you, InformationWeek reader, clearly understand Hadoop's significance, there's a high probability that many people in your organization -- including more than a few managerial types in the C-suite -- aren't really sure what Hadoop is, what it does, or why it's important.
So, how do you explain Hadoop to non-geeks? One approach is to focus on the benefits of Hadoop and big data, rather than providing mind-numbing details (with forgettable acronyms) on how it all works.
Forrester analyst Mike Gualtieri took this "benefits" approach in June when he posted a brief tutorial video that provided an easy-to-grasp overview of Hadoop. He calls it a platform that makes big data easier to manage.
[Here's why your business users may want to know more about Hadoop: Hadoop's Second Generation Offers More To Enterprises.]
"To understand Hadoop, you have to understand two fundamental things about it," Gualtieri explained in his video. They are: How Hadoop stores files, and how it processes data.
He added: "Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very, very large files. It also lets you store many, many files."
By focusing less on the jargon of Hadoop and big data, and more on the platform's real-world benefits, experts can effectively convey its value to business colleagues who do not have data-science backgrounds.
"Mainstream business users don't need to know how Hadoop works," Gualtieri told InformationWeek via email. "But they do need to understand that the constraints they once had on storing and processing data are removed when Hadoop is installed."
As a result, "the business can start thinking big again when it comes to data," he added.
The barrage of news reports on all facets of big data, including its potential to fight various diseases, reduce government bureaucracy, locate terrorists, and on a more mundane level, help businesses sell more stuff, has helped introduce business people to Hadoop, even though a lot more education is needed.
"There is less confusion than there was 12 months ago," Gualtieri said. "Executives just know that it is a big data technology, and that is enough for them."
OK, so what's this "MapReduce" thing then? It's part of Hadoop too, right? As Gualtieri explained in his video: "The second characteristic of Hadoop is its ability to process that data, or at least (provide) a framework for processing that data. That's called MapReduce."
But rather than take the conventional step of moving data over a network to be processed by software, MapReduce uses a smarter approach tailor made for big data sets.
Moving data over a network "can be very, very slow, especially for really large data sets," Gualtieri added in the video. "Imagine if you're opening a really, really big file on your laptop, it takes a long, long time. It takes much longer than if it's a short, tiny file."
So rather than move the data to the software, MapReduce moves the processing software to the data. Hadoop is still very complex to use, but many startups and established companies are creating tools to change that, a promising trend that should help remove much of the mystery and complexity that shrouds Hadoop today.
"Hadoop innovation is happening incredibly fast," said Gualtieri via email. "The open source community and commercial vendors are working like gangbusters to make SQL access super-fast on Hadoop. That will open up connections from many other tools like Tableau, and other BI tools that interface to data using SQL."
But don't use that last paragraph to explain Hadoop to novices, please.
Emerging software tools now make analytics feasible -- and cost-effective -- for most companies. Also in the Brave The Big Data Wave issue of InformationWeek: Have doubts about NoSQL consistency? Meet Kyle Kingsbury's Call Me Maybe project. (Free registration required.)
About the Author(s)
You May Also Like
The New Frontier of Cyber Security: Securing the Network Edge
TeamDynamix Spotlight - Pima County Automates ITSM
TeamDynamix Spotlight - Frontwave Credit Union Adopts Digital First
City of Buffalo Automates IT Service Management and Improves Self-Service Delivery
How to Empower Business Users with Better Self-Service Tools