Big Data // Big Data Analytics
News
11/20/2013
04:00 PM
Connect Directly
Google+
RSS
E-Mail
50%
50%
Repost This

Does Hadoop Have a Speed Problem?

"Fast data" -- information provided in near-real time -- is key to getting the most out of analytics projects, says ParStream CEO Michael Hummel.

If you're familiar with the classic three Vs model of big data -- that's high volume, high velocity, and high variety -- you're also aware that big data means much more than simply stockpiling petabytes of information. The ability to analyze data streams in near-real time is essential, too, and a variety of tools is emerging to fill this need.

One such tool is ParStream, a real-time database for big-data analytics. Its developer, an enterprise software company also named ParStream, is based in Cologne, Germany. It was founded in 2008 by Michael Hummel and Jörg Bienert, the firm's CEO and CTO, respectively. In a phone interview with InformationWeek, Hummel said the big-data marketplace is changing rapidly as organizations begin to see the value of "fast data," the ability to analyze data nearly in real time. He also called Hadoop overhyped, but acknowledged that the open-source software framework has become a de facto standard of sorts for big data.

"Definitely there is a de facto standard, which is called Hadoop. But people do not actually mean Hadoop when they say Hadoop," said Hummel.

How so?

"Hadoop is a multilayered system, and the lowest level of the system is called HDFS: Hadoop Distributed File System. And that's what people mean when they say, 'We store all the data in Hadoop,' they actually mean that they store it on a distributed file system called HDFS. And all the systems use the data stored there, and can access it from there," he said. MapReduce is just one of many options, including ParStream, that can analyze this data, he added.

[ How does big data affect your small business? Read Big Data FAQ: Separating Signal From Noise. ]

"Nowadays, there are solutions that can handle petabytes of data," said Hummel. "We are able to do it. The MapReduce approach was a very, very good first step in that direction. It made it possible." But in the world of big data, as with other emerging technologies, the first solution is usually not the best long-term choice, he said.

"Today, people today talk about speed, they talk about real time. They talk about making data accessible at your fingertips. So we're talking about sub-second response times. But Hadoop was not made for that. It was made for long-running queries that come back after 14 days."

ParStream offers a real-time database for big-data analytics.
ParStream offers a real-time database for big-data analytics.

Not surprisingly, that's a problem for organizations seeking fast-data analysis. Hadoop "was never made for interactive analytics, which is the big thing at the moment," said Hummel. "People and companies see it as absolutely relevant to be able to analyze data in a very, very short time." Fast data, he added, means "that you don't consider only the data from yesterday, but also the data from now."

For instance, fast-data analytics can benefit retail sites, particularly those where it's difficult to react immediately to shoppers' behaviors and make product or service recommendations. "That's a missed opportunity," said Hummel. "Think of people who fill up their shopping baskets, and then don't do anything on that website for five minutes. Perhaps they're not interested anymore, or they're distracted. Maybe they have found something better on a different website."

Fast-data analytics allow businesses to respond more rapidly to their customers and improve the buying experience. Hummel added: "Being able to engage with these people while they are still on the website -- or when they've just left -- makes much more sense than waiting until the next day to send out a reminder to say, 'OK, we'll give you a 5 percent bonus if you buy today.'"

Database administrators are the caretakers of an organization's most precious asset -- its data -- but rarely do they have the experience and skills required to secure that data. Indeed, the goals of DBAs and security pros are often at odds. That gap must be bridged in order for organizations to protect data in an increasingly threat-ridden environment. In the Dark Reading How Enterprises Can Use Big Data To Improve Security report, we examine what DBAs should know about security, as well as recommend how database and security pros can work more effectively together. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
MHummel
50%
50%
MHummel,
User Rank: Apprentice
11/22/2013 | 6:19:31 PM
Re: How fast is too fast?
Thomas, I don't think that fast data is the problem here; it's the automated responses that can become problematic. The data speed is simply an enabler.

Wall Street experienced problems because their automated trading systems not only recommend actions but also execute them – all without human interjection.

To obviate any complications, and until these automated systems learn to think about the consequences of their actions, we can rely on the recommendations but should always have decision-makers involved in the execution.
HM
50%
50%
HM,
User Rank: Strategist
11/21/2013 | 3:30:33 PM
Re: Big Data
Jeff, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems and is extremely good at what it was designed to do - Massive ingestion of semi structured (or unstructured) data, converting to a normalized form (like RDBMS tables) and performing analytics in an easy to use SQL like language (much more powerful than SQL). In addition, it cleanly integrates with Apache Kafka to provide near real time analytics. More info at http://hpccsystems.com.
virsingh211
50%
50%
virsingh211,
User Rank: Strategist
11/21/2013 | 6:56:55 AM
Re: How fast is too fast?
I agree you Samicksha, but for Hadoop fans one thing to be noted is, Hadoop provides no security model, i.e. it cannot detect a man in the middle attack between nodes.
samicksha
100%
0%
samicksha,
User Rank: Strategist
11/21/2013 | 2:59:57 AM
Re: How fast is too fast?
Impressive part of Hadoop, it is flexible and you can design the cluster size to make task done accordingly.
Thomas Claburn
50%
50%
Thomas Claburn,
User Rank: Author
11/20/2013 | 4:34:46 PM
How fast is too fast?
Given the problems Wall Street has had with automated trades, I wonder whether web sites that rely on rapid-fire analytics data and automated responses will self-optimize too quickly, magnifying problems rather than solving them.
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.