You may have seen quite a few success stories about enterprises implementing Hadoop and related technologies as part of companies' big data analytics programs. You may wonder if you are behind the curve if you haven't implemented this open source big data technology yet. Here's some reassurance for you -- while there may be plenty of success stories to tell, they are the exception. Many projects don't even get started, and many of the projects that do get started will fail.
Gartner is forecasting that through 2018, 70% of Hadoop deployments will fail to meet cost savings and revenue generation objectives due to skills and integration challenges. It's a prediction the company actually made back in 2016, and according to Gartner Research VP Merv Adrian, it's a prediction that's held up pretty well. Adrian provided an update for enterprise organizations -- Hadoop and Spark: Understanding Open Source Opportunities and Risks -- at the Gartner Data and Analytics Summit in Grapevine, Texas this month.
"I've never had a one-on-one inquiry with a client who was willing to acknowledge that some of what they are doing might fail," Adrian said in the session. Yet many organizations are embarking on Hadoop and Spark projects that ultimately may not succeed.
What's more, there aren't many enterprises that are actually in production with Hadoop projects.
"We asked people directly, have you put this into production? Are there users depending on it?" said Adrian. "For three successive years the number of deployments for these kinds of projects for our clients has been about 15%. It's hardly moved."
Clearly there's a big disconnect between organizations inquiring about implementing Hadoop and organizations actually implementing Hadoop. Adrian said that the reason for the gap is because technologies such as Hadoop and Spark are new, they are difficult, and they are complicated.
Nonetheless, organizations still ask about these technologies and may have them on their roadmaps. Organizations want to know what to use when and where when it comes to Hadoop and Spark, according to Adrian.
Spark or Hadoop?
"At the very simple level, one of them works on disk and the other one works in-memory, but that's much, much too simple. Both of them do both," Adrian said.
Hadoop has evolved to incorporate technology for data streaming, too, and organizations are spending more time thinking about data in motion.
"We all have to deal with data in motion now...we as data management people didn't think too much about this stuff before until it got saved," Adrian said. "...Data in motion is one of most important topics we think about now."
Meanwhile, a lot of streaming tends to go over to Spark because much of what you want to do in Spark environment works better in in-memory environments, Adrian said.
Challenges and obstacles
Organizations that are deploying these technologies need to look at their workflow and ask the following questions:
- How stable and reproducible is it?
- What do you do when part of it fails?
- How are you monitoring it?
- How are you monitoring it when you move it to the cloud?
Adrian said that it's important to continue monitoring the data even if you outsource the work to the cloud.
"You can't outsource responsibility," he said. "Just because you move it to the cloud doesn't mean you don't have to think about it."
Key use cases
So what are the key use cases for Hadoop, now and in the next few years? Do your organization's plans look similar to the plans of your peers? The following are some of the ways that enterprises are using or plan to use this technology:
- Modernize their infrastructure
- Economically refactor their storage to put the less important, less frequently used, less risky data into a lower cost layer
- ETL, now and in the future
Adrian said that one thing Hadoop won't be is a replacement for an organization's old data warehouse.
"It might complement it, enhance it, or supplement it," he said. "It won't replace it. Not if your data warehouse is being used for anything important, has a time expectation associated with it, has an accuracy expectation associated with it, a policy-based expectation associated with it. Those are the areas where Hadoop has some work to be done."
So if your Hadoop implementation is still in pilot, if it's still in the planning stages, or if it's still just an idea, rest assured. You are in good company.