Big Data // Big Data Analytics
News
2/10/2014
11:55 AM
Connect Directly
Google+
RSS
E-Mail
100%
0%

Real-Time Analytics: Ready For Its Close-Up?

Continuous, real-time analysis based on stream processing could be next the big thing in big data.

16 Top Big Data Analytics Platforms
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)

One of the knocks against Apache Hadoop has been that it was built as a batch processing system and hence is no good for real-time data analytics. Hadoop 2.0 promises a lot of improvement in this area, however. Its YARN resource management layer, for instance, offers better support for stream-processing platforms such as Storm, which recently became an Apache open-source project. Hadoop's shortcomings have also created an opportunity for stream-processing technology providers, which have been busy partnering up with Hadoop vendors.

A growing number of companies are entering the real-time, stream-processing space, including Vitria, a 20-year-old Silicon Valley firm. According to Vitria co-founder and chief technical officer Dr. Dale Skeen, the market for continuous, real-time analysis is quickly evolving from "visionary" early adopters to more mainstream use.

"We're seeing the transition into what I would call the early majority market of this new technique," Dr. Skeen said in a phone interview with InformationWeek.

Skeen knows the big-data market well. He cofounded Vitria with Dr. JoMei Chang in 1994, and has more than 20 years of experience in building large-scale distributed computing and database systems. Prior to starting Vitria, Skeen cofounded Tibco Software, an infrastructure software provider, and has held faculty positions at University of California, Berkeley, and Cornell University.

[How can data visualization tools change business conversations? See Big Data Is Nothing If Not Visual.]

There's an important distinction -- one often misunderstood -- between continuous, real-time streaming analytics and other types of operational intelligence tools that offer "on-demand, near real-time" analytics built more for forensic analysis, Skeen said.

"We build a real-time operational intelligence platform," said Skeen. "We're talking about a type that is continuously monitoring based on streaming analytics. It's constantly assessing the situation and immediately taking action if something goes awry -- or if an opportunity presents itself."

By comparison, the on-demand, near-real-time approach has a different set of attributes.

"It's very valid technology, but it's mainly used for investigations," said Skeen. "With on-demand, you have to ask the right question at the right time... and then you get the answer back."

The on-demand approach requires the user to ask the right questions at the right time, he added. If you miss a significant event, you may miss an opportunity to correct a critical issue or take advantage of a business opportunity.

"Then you're flying blind, and that's the big drawback with on-demand," Skeen claimed.

The continuous real-time approach, however, is always monitoring.

"The moment something interesting happens, where there's an opportunity to sell more to a customer, or there's a threat -- a bad guy is trying to break into your system or get money -- you immediately detect that and can take action," Skeen noted.

He added: "Everyone talks about actionable intelligence, well, we have real-time intelligence with action. You can completely automate some of these actions with business processes or rules... or make human-guided workflow."

In industries such as banking, these are critical scenarios where minutes, seconds, or even milliseconds matter.

"Fraud, for example -- dispensing cash at an ATM," said Skeen. "Would you rather discover it after the fact and investigate why it happened and why you dispensed that cash in that situation? Or would you rather discover it while it's happening and perhaps be able to shut it down?"

Vitria's customers include European mobile carrier O2, which runs the company's stream-processing platform for spam and fraud detection, as well as for customer service.

Having a wealth of data is a good thing -- if you can make sense of it. Most companies are challenged with aggregating and analyzing the plethora of data being generated by their security applications and devices. This Dark Reading report, How Existing Security Data Can Help ID Potential Attacks, recommends how to effectively leverage security data in order to make informed decisions and spot areas of vulnerability. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Threaded  |  Newest First  |  Oldest First
Hoganator
100%
0%
Hoganator,
User Rank: Apprentice
2/10/2014 | 5:32:48 PM
Real-Time Analytics
Hadoop opened people's eyes to the value found in large pools of data, but the delay imposed by batch processing dilutes that value considerably. Real-time (or near real-time) analytics are far more valuable. Stream processors (like Twitter Storm, Linkedin Samza, Yahoo S4, Amazon Kinesis, Microsoft StreamInsight, etc.) provide the ability to process/filter/count/transform the data in real-time but the window of data visibility is limited and fleeting. Think of it as looking out the side window of a car doing 100MPH. You can count things and do limited "processing" but only in looking back do you get the true lay of the land. Stream processors require persistence, to enable historical analytics. The problem is: (a) in-memory systems are far to expensive given the huge data volume; and (b) disk-based systems cannot keep up with the flood of fast data, causing an impedance mismatch. Further, like Hadoop, in order to spur corporate adoption it requires SQL support. Hadoop+YARN is interesting, but the underlying large grain file system is incompatible to smaller "block-sized" data found in streams.

Resources: 

http://scaledb.blogspot.com/2014/01/stream-processors-and-dbms-persistence.html

http://scaledb.com/high-velocity-data.php

 

 

 
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 7, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program!
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.