We sort through this week's clashing din of news from Teradata, SAS, Pivotal, Platfora and Hortonworks in search of the inside edge to big data breakthroughs.
Platfora's big data analytics and data visualization platform runs on top of Hadoop, and to speed performance it handles analysis in memory. A 3.0 release introduced on Wednesday adds Event Stream Analytics, which is the ability for users to look at a timeline of events around a common subject. For example, if customers are what you care about, you can quickly develop the timeline of ad clicks, website clicks, mobile interactions, physical store purchases and every other behavior captured in data about that customer.
"So instead of viewing all that as separate data, you can develop a timeline and you can look at behavior across time," Werther explains. "You can do funnel-chart analysis to look at the falloff through stages of the customer lifecycle. You can see if Web behavior patterns continue offline. Or you can look at the behavior of a customer after acquisition to figure out what you should pay to acquire similar customers."
A second Platfora 3.0 upgrade is an Entity-Centric Data Catalog that lets customers see all the data sets available within Hadoop that can be applied to an entity, such as customer, product or location. This avoids the tedious process of manually exploring and navigating raw data sets. A third upgrade Platfora announced this week is Iterative Segmentation, whereby users might take segments of, say, behavior on websites and iteratively explore how those groups respond to ads, shop in stores or switch services.
Platfora emerged as a company only about a year ago, and its product has been available for about seven months. The customer list is short but impressive, including Disney, Edmunds.com, Netflix, Washington Post and Shopify. Platfora is at the opposite end of the spectrum from the likes of Teradata and SAS: few customers and not much of a track record. But it's offering entirely new technology that fills lots of gaps in exploration, analysis and visualization of top of Hadoop. If you accept the thesis that big data presents bigger and different questions, Platfora would be of interest.
SAS Gives You Options
If you were to liken analytics to the car market, Platfora would be Tesla and SAS would be Mercedes. SAS says it has had a big data strategy for at least five years, but it has been focusing mostly on the volume problem. High-scale advanced analytic performance demanded a move to massive parallel processing, so seven years ago it partnered with the likes of EMC, Netezza and Teradata to handle analytic scoring processes within these grid-powered databases.
Five years ago, SAS deepened its in-database work by bringing text mining, forecasting, optimization and econometric analytic processes together in SAS High Performance Analytics, an option available through EMC, Teradata, grids or industry-standard (x86) servers specified by SAS. Two years ago, SAS introduced SAS Visual Analytics and the SAS LASR Server, which run on commodity grids or dedicated instances of the Hadoop Distributed File System.
This week SAS introduced yet more options for analytics at high scale with the announcement that SAS Scoring and SAS High Performance Analytics can now run on Cloudera, Hortonworks or generic Apache Hadoop clusters. SAS also embraced SAP Hana, a clustered, in-memory database, by adding SAS Access for Hana, planning the release of SAS Scoring on Hana and promising to work with joint customers to determine what other analytic processes or applications might make sense to SAP customers.
Running on Cloudera, Hortonworks or generic Hadoop is obviously the headline of big data interest, but do SAS analytics address the sort of analyses big data practitioners are after? Text mining is an unstructured data-analysis option, but what of "connecting the dots across multiple streams of data" as Werther described?
"There's a lot of what I call 'horizon-free thinking' about how we can do near-real time, analytically driven decision making on any and all data, but that's way out there," said Russ Cobb, VP of alliances and product marketing for SAS in an interview with InformationWeek this week. "A lot of our customers are still trying to figure out how they can take all their corporate data, structured and unstructured, and analyze that to get new insights. More importantly, they want to drive those insights back into transactional systems and decision processes."
I suspect "horizon-free" thinkers could use SAS capabilities to develop holistic analyses of, say, customer behavior across multiple interaction points -- Web, mobile, social -- to consider customer lifetime value or what the company should pay to acquire customers with certain specified attributes. But Cobb's statement suggests that most SAS customers are focusing on internal data and predictive decisions in particular domains. At the very least, SAS isn't hung up on backend choices, so whether the data is in generic grids, high-scale databases or the leading flavor of Hadoop, SAS can analyze it -- quickly.
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.