"Fast data" -- information provided in near-real time -- is key to getting the most out of analytics projects, says ParStream CEO Michael Hummel.
If you're familiar with the classic three Vs model of big data -- that's high volume, high velocity, and high variety -- you're also aware that big data means much more than simply stockpiling petabytes of information. The ability to analyze data streams in near-real time is essential, too, and a variety of tools is emerging to fill this need.
One such tool is ParStream, a real-time database for big-data analytics. Its developer, an enterprise software company also named ParStream, is based in Cologne, Germany. It was founded in 2008 by Michael Hummel and Jörg Bienert, the firm's CEO and CTO, respectively. In a phone interview with InformationWeek, Hummel said the big-data marketplace is changing rapidly as organizations begin to see the value of "fast data," the ability to analyze data nearly in real time. He also called Hadoop overhyped, but acknowledged that the open-source software framework has become a de facto standard of sorts for big data.
"Definitely there is a de facto standard, which is called Hadoop. But people do not actually mean Hadoop when they say Hadoop," said Hummel.
"Hadoop is a multilayered system, and the lowest level of the system is called HDFS: Hadoop Distributed File System. And that's what people mean when they say, 'We store all the data in Hadoop,' they actually mean that they store it on a distributed file system called HDFS. And all the systems use the data stored there, and can access it from there," he said. MapReduce is just one of many options, including ParStream, that can analyze this data, he added.
"Nowadays, there are solutions that can handle petabytes of data," said Hummel. "We are able to do it. The MapReduce approach was a very, very good first step in that direction. It made it possible." But in the world of big data, as with other emerging technologies, the first solution is usually not the best long-term choice, he said.
"Today, people today talk about speed, they talk about real time. They talk about making data accessible at your fingertips. So we're talking about sub-second response times. But Hadoop was not made for that. It was made for long-running queries that come back after 14 days."
ParStream offers a real-time database for big-data analytics.
Not surprisingly, that's a problem for organizations seeking fast-data analysis. Hadoop "was never made for interactive analytics, which is the big thing at the moment," said Hummel. "People and companies see it as absolutely relevant to be able to analyze data in a very, very short time." Fast data, he added, means "that you don't consider only the data from yesterday, but also the data from now."
For instance, fast-data analytics can benefit retail sites, particularly those where it's difficult to react immediately to shoppers' behaviors and make product or service recommendations. "That's a missed opportunity," said Hummel. "Think of people who fill up their shopping baskets, and then don't do anything on that website for five minutes. Perhaps they're not interested anymore, or they're distracted. Maybe they have found something better on a different website."
Fast-data analytics allow businesses to respond more rapidly to their customers and improve the buying experience. Hummel added: "Being able to engage with these people while they are still on the website -- or when they've just left -- makes much more sense than waiting until the next day to send out a reminder to say, 'OK, we'll give you a 5 percent bonus if you buy today.'"
Database administrators are the caretakers of an organization's most precious asset -- its data -- but rarely do they have the experience and skills required to secure that data. Indeed, the goals of DBAs and security pros are often at odds. That gap must be bridged in order for organizations to protect data in an increasingly threat-ridden environment. In the Dark Reading How Enterprises Can Use Big Data To Improve Security report, we examine what DBAs should know about security, as well as recommend how database and security pros can work more effectively together. (Free registration required.)
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.