Phu Hoang, who originally built Hadoop-based systems at Yahoo before co-founding the data batch and streaming company Data Torrent, understands the limitations of running a batch-only platform like Hadoop and the need that many enterprises have for real time analytics, or at least analytics that give them faster mean times to decision.
“People who are seeing demonstrations of it, proofs of concepts, are starting to really face that incredible opportunity of ‘Oh my goodness, the data I’ve been computing, which I was getting once day or every eight hours, I literally can have in minutes or seconds,'” says Hoang.
This real time and near real time analytics revolution is already in motion with compelling use cases.
- Financial services companies use real time fraud detection systems to intercept fraud by detecting it before it happens. They do this by analyzing in real time the usage patterns of their customers, and by flagging any usage occurrence that deviates from these patterns and then instantly locking down a card at the register so the transaction can’t go through. The stakes are high. In the U.S. alone, 31.8 million persons had their credit cards breached in 2014.
- Logistics companies track goods in transport in real time. They can also observe failures of temperature and humidity controls in closed shipment containers, as well as tampering or breakage of container seals. These real time analytics that are driven by sensor-generated machine data flowing over the Internet are filtered and analyzed in real time, enabling companies to take immediate action on situations that threaten cargo half a world away. In one case, a major oil and gas exploration company operating in Africa reduced cargo thefts on transport trucks from 50 percent to four percent with the attachment of sensors to payloads. Needless to say, operating expenses and time to market both benefitted.
- Retailers use clickstream analytics to launch instantaneous promotions and sales offers. They capture the customer’s interest and wallet at the point of sale, and sell even more. Retailers can also track inventory and instantly replenish supplies in areas that demonstrate high demand.
However, while all of these are convincing business use cases, the use of real time or near real time analytics is only as good as the underlying technologies that can deliver them. This is why it is encouraging to see some of the recent breakthroughs in hardware and software that will facilitate it.
Here are some of these technological developments.
New semiconductor technology speeds big data processing and can bypass the ETL stage of preparing big data. Field Programmable Gate Arrays (FPGAs) can replace many of the functions of standard x86 style processor chips and also eliminate the need to string together multiple x86 processors so that enough power can be clustered together to propel a real time or near real time data analytics effort. When FPGA technology is combined with data that is striped across solid state disk (SSD) to increase processing speeds, raw data flowing into the enterprise from the network’s edge can move rapidly into the hands of business users, and be resident on a single box. The technology still uses an x86 frontend processor, but this frontend processor works in concert with an FPGA fabric that takes on the lion’s share of the processing work, yet is abstracted from the end user, who only has to worry about producing the result of an analytics query that will likely process in half the time that it took him before.
The use of semantic layers on top of Hadoop to make user access easier, faster and more secure. One such method is the online analytical processing (OLAP) cube, which is a logical interface to raw Hadoop that enables business analytics users to easily produce their own queries without having to worry about the underlying complexities of Hadoop. By using a logical semantics layer on top of raw Hadoop, a business user without an IT background can obtain answers to many queries on his own in five seconds or less.
The use of new middleware messaging protocols like data distribution service (DDS). DDS can both publish and accept data from multiple sources in real time and then transport data in real time "moving databases" that enable companies to immediately act upon it.
For any new technology, there are always champions and detractors, with the much of the enterprise community sitting on the sidelines while industry consortia and vendors decide which methodologies will prevail. But regardless of which set of technologies prevail, mean times to decision for enterprises using analytics will get shorter, and it isn’t too soon to start visualizing business cases for real time information that can transform your company.