Phu Hoang, who originally built Hadoop-based systems at Yahoo before co-founding the data batch and streaming company Data Torrent, understands the limitations of running a batch-only platform like Hadoop and the need that many enterprises have for real time analytics, or at least analytics that give them faster mean times to decision.
“People who are seeing demonstrations of it, proofs of concepts, are starting to really face that incredible opportunity of ‘Oh my goodness, the data I’ve been computing, which I was getting once day or every eight hours, I literally can have in minutes or seconds,'” says Hoang.
This real time and near real time analytics revolution is already in motion with compelling use cases.
However, while all of these are convincing business use cases, the use of real time or near real time analytics is only as good as the underlying technologies that can deliver them. This is why it is encouraging to see some of the recent breakthroughs in hardware and software that will facilitate it.
Here are some of these technological developments.
New semiconductor technology speeds big data processing and can bypass the ETL stage of preparing big data. Field Programmable Gate Arrays (FPGAs) can replace many of the functions of standard x86 style processor chips and also eliminate the need to string together multiple x86 processors so that enough power can be clustered together to propel a real time or near real time data analytics effort. When FPGA technology is combined with data that is striped across solid state disk (SSD) to increase processing speeds, raw data flowing into the enterprise from the network’s edge can move rapidly into the hands of business users, and be resident on a single box. The technology still uses an x86 frontend processor, but this frontend processor works in concert with an FPGA fabric that takes on the lion’s share of the processing work, yet is abstracted from the end user, who only has to worry about producing the result of an analytics query that will likely process in half the time that it took him before.
The use of semantic layers on top of Hadoop to make user access easier, faster and more secure. One such method is the online analytical processing (OLAP) cube, which is a logical interface to raw Hadoop that enables business analytics users to easily produce their own queries without having to worry about the underlying complexities of Hadoop. By using a logical semantics layer on top of raw Hadoop, a business user without an IT background can obtain answers to many queries on his own in five seconds or less.
The use of new middleware messaging protocols like data distribution service (DDS). DDS can both publish and accept data from multiple sources in real time and then transport data in real time "moving databases" that enable companies to immediately act upon it.
For any new technology, there are always champions and detractors, with the much of the enterprise community sitting on the sidelines while industry consortia and vendors decide which methodologies will prevail. But regardless of which set of technologies prevail, mean times to decision for enterprises using analytics will get shorter, and it isn’t too soon to start visualizing business cases for real time information that can transform your company.