Continuous, real-time analysis based on stream processing could be next the big thing in big data.
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)
One of the knocks against Apache Hadoop has been that it was built as a batch processing system and hence is no good for real-time data analytics. Hadoop 2.0 promises a lot of improvement in this area, however. Its YARN resource management layer, for instance, offers better support for stream-processing platforms such as Storm, which recently became an Apache open-source project. Hadoop's shortcomings have also created an opportunity for stream-processing technology providers, which have been busy partnering up with Hadoop vendors.
A growing number of companies are entering the real-time, stream-processing space, including Vitria, a 20-year-old Silicon Valley firm. According to Vitria co-founder and chief technical officer Dr. Dale Skeen, the market for continuous, real-time analysis is quickly evolving from "visionary" early adopters to more mainstream use.
"We're seeing the transition into what I would call the early majority market of this new technique," Dr. Skeen said in a phone interview with InformationWeek.
Skeen knows the big-data market well. He cofounded Vitria with Dr. JoMei Chang in 1994, and has more than 20 years of experience in building large-scale distributed computing and database systems. Prior to starting Vitria, Skeen cofounded Tibco Software, an infrastructure software provider, and has held faculty positions at University of California, Berkeley, and Cornell University.
There's an important distinction -- one often misunderstood -- between continuous, real-time streaming analytics and other types of operational intelligence tools that offer "on-demand, near real-time" analytics built more for forensic analysis, Skeen said.
"We build a real-time operational intelligence platform," said Skeen. "We're talking about a type that is continuously monitoring based on streaming analytics. It's constantly assessing the situation and immediately taking action if something goes awry -- or if an opportunity presents itself."
By comparison, the on-demand, near-real-time approach has a different set of attributes.
"It's very valid technology, but it's mainly used for investigations," said Skeen. "With on-demand, you have to ask the right question at the right time... and then you get the answer back."
The on-demand approach requires the user to ask the right questions at the right time, he added. If you miss a significant event, you may miss an opportunity to correct a critical issue or take advantage of a business opportunity.
"Then you're flying blind, and that's the big drawback with on-demand," Skeen claimed.
The continuous real-time approach, however, is always monitoring.
"The moment something interesting happens, where there's an opportunity to sell more to a customer, or there's a threat -- a bad guy is trying to break into your system or get money -- you immediately detect that and can take action," Skeen noted.
He added: "Everyone talks about actionable intelligence, well, we have real-time intelligence with action. You can completely automate some of these actions with business processes or rules... or make human-guided workflow."
In industries such as banking, these are critical scenarios where minutes, seconds, or even milliseconds matter.
"Fraud, for example -- dispensing cash at an ATM," said Skeen. "Would you rather discover it after the fact and investigate why it happened and why you dispensed that cash in that situation? Or would you rather discover it while it's happening and perhaps be able to shut it down?"
Vitria's customers include European mobile carrier O2, which runs the company's stream-processing platform for spam and fraud detection, as well as for customer service.
Having a wealth of data is a good thing -- if you can make sense of it. Most companies are challenged with aggregating and analyzing the plethora of data being generated by their security applications and devices. This Dark Reading report, How Existing Security Data Can Help ID Potential Attacks, recommends how to effectively leverage security data in order to make informed decisions and spot areas of vulnerability. (Free registration required.)
Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.