Does Hadoop Have a Speed Problem?

Fast data -- information provided in near-real time -- is key to getting the most out of analytics projects, says ParStream CEO Michael Hummel.

Jeff Bertolucci, Contributor

November 20, 2013

4 Min Read
ParStream offers a real-time database for big-data analytics.

If you're familiar with the classic three Vs model of big data -- that's high volume, high velocity, and high variety -- you're also aware that big data means much more than simply stockpiling petabytes of information. The ability to analyze data streams in near-real time is essential, too, and a variety of tools is emerging to fill this need.

One such tool is ParStream, a real-time database for big-data analytics. Its developer, an enterprise software company also named ParStream, is based in Cologne, Germany. It was founded in 2008 by Michael Hummel and Jörg Bienert, the firm's CEO and CTO, respectively. In a phone interview with InformationWeek, Hummel said the big-data marketplace is changing rapidly as organizations begin to see the value of "fast data," the ability to analyze data nearly in real time. He also called Hadoop overhyped, but acknowledged that the open-source software framework has become a de facto standard of sorts for big data.

"Definitely there is a de facto standard, which is called Hadoop. But people do not actually mean Hadoop when they say Hadoop," said Hummel.

How so?

"Hadoop is a multilayered system, and the lowest level of the system is called HDFS: Hadoop Distributed File System. And that's what people mean when they say, 'We store all the data in Hadoop,' they actually mean that they store it on a distributed file system called HDFS. And all the systems use the data stored there, and can access it from there," he said. MapReduce is just one of many options, including ParStream, that can analyze this data, he added.

[ How does big data affect your small business? Read Big Data FAQ: Separating Signal From Noise. ]

"Nowadays, there are solutions that can handle petabytes of data," said Hummel. "We are able to do it. The MapReduce approach was a very, very good first step in that direction. It made it possible." But in the world of big data, as with other emerging technologies, the first solution is usually not the best long-term choice, he said.

"Today, people today talk about speed, they talk about real time. They talk about making data accessible at your fingertips. So we're talking about sub-second response times. But Hadoop was not made for that. It was made for long-running queries that come back after 14 days."

Not surprisingly, that's a problem for organizations seeking fast-data analysis. Hadoop "was never made for interactive analytics, which is the big thing at the moment," said Hummel. "People and companies see it as absolutely relevant to be able to analyze data in a very, very short time." Fast data, he added, means "that you don't consider only the data from yesterday, but also the data from now."

For instance, fast-data analytics can benefit retail sites, particularly those where it's difficult to react immediately to shoppers' behaviors and make product or service recommendations. "That's a missed opportunity," said Hummel. "Think of people who fill up their shopping baskets, and then don't do anything on that website for five minutes. Perhaps they're not interested anymore, or they're distracted. Maybe they have found something better on a different website."

Fast-data analytics allow businesses to respond more rapidly to their customers and improve the buying experience. Hummel added: "Being able to engage with these people while they are still on the website -- or when they've just left -- makes much more sense than waiting until the next day to send out a reminder to say, 'OK, we'll give you a 5 percent bonus if you buy today.'"

Database administrators are the caretakers of an organization's most precious asset -- its data -- but rarely do they have the experience and skills required to secure that data. Indeed, the goals of DBAs and security pros are often at odds. That gap must be bridged in order for organizations to protect data in an increasingly threat-ridden environment. In the Dark Reading How Enterprises Can Use Big Data To Improve Security report, we examine what DBAs should know about security, as well as recommend how database and security pros can work more effectively together. (Free registration required.)

About the Author

Jeff Bertolucci

Contributor

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights