Wish 4: Real-time Analysis Options
Another item on the big-data analytics wish list is real-time performance. Two startup vendors going after this opportunity are marketing analytics vendor Causata and real-time Hadoop-analysis vendor HStreaming.
For Causata, "real time" means making decisions in under 50 milliseconds. You need that kind of speed to change content, banner ads and marketing offers while your customers are still active on websites and mobile devices. Causata uses Hadoop's HBase NoSQL database for storage or marketing-related data that might include clickstreams, campaign-response data and CRM records. HBase isn't good at real-time querying, however, so Causata runs Java-based algorithms on a proprietary query engine to improve performance.
As its name hints, HStreaming relies on stream-processing technology that's similar to the event-processing engines used by financial trading operations and offered by IBM (InfoSphere Streams), Progress Software (Apama), SAP (Sybase Aleri), Tibco (Complex Event Processing) and others. HStreaming takes data directly from always-on sources such as video surveillance cameras, cell towers and sensors, and spots patterns in that data while it's still in flight. The technology also provides a form of extract, transform, load (ETL) for then storing the data onto Hadoop for later analysis. HStreaming cites video surveillance, network optimization and mobile advertising as its top applications. In all three cases, real-time insight and action are a must.
Taking a different tack, Hadoop software and support vendor MapR has announced a partnership with Informatica through which it claims it will become the first and only Hadoop software distributor capable of delivering near-real-time data streaming on the big-data platform. MapR's Hadoop distribution features a lockless storage services layer that works hand-in-hand with Informatica messaging software to continuously stream massive amounts of data into Hadoop. Couple this capability with a coming SQL-on-Hadoop option such as MapR-favored Drill, and you'll have yet another option for fast big-data analysis.