A veteran of multiple IBM software units, including a prior stint in Information Management, Picciano is tasked with revitalizing a business that has been "a little flat for the last year or two," according to Gartner Analyst Merv Adrian. "No doubt their feistiness is going to be more evident," Adrian says.
Feisty is a fitting description for Picciano, who in this in-depth interview with InformationWeek is at turns effusive and dismissive. He talks at length about IBM's vision for five big data use cases while ceding nothing to database competitors SAP, MongoDB and Cassandra. Read on for the big data perspective from inside IBM.
InformationWeek: There's a growing view that companies will use Hadoop as a reservoir for big data. Everyone agrees that conventional databases will still have a role, but some see big enterprise data warehouses being displaced. What's your view?
Bob Picciano: Sometimes people drastically overuse the term big data. We've done more than 3,000 engagements with customers around our big data portfolio, and almost all of them have fallen into one of five predominant use cases: developing a 360-degree view of the customer; understanding operational analytics; addressing threat, fraud and security; analyzing information that you didn't think was useable before; and offloading and augmenting data warehouses.
Some of these use cases are more Hadoop-oriented than others. If you think about exploring new data types including a high degree of unstructured data, for example, it doesn't make sense to transform that data into structured information and put it into a data warehouse. You'd use Hadoop for that. We have an offering called Data Explorer, which is based on our Vivisimo acquisition, that helps index and categorize unstructured information so you can navigate, visualize, understand and correlate it with other things.
Operational analytics is another use case involving Hadoop. There we just delivered a new offering with our Smart Cloud and Smarter Infrastructure that focuses on helping clients to pull in and analyze log information to spot events that could be used to help improve the resiliency of operational systems.
In the case of developing a 360-degree view of customers, maybe you have a system of master data [like CRM], so you have customer data files, but how do you also include information from public or social domains?... And how do you sew together interactions on Web pages? That's very much a Hadoop data workload.
IW: IBM has a Hadoop offering (with IBM BigInsights), but so, too, does Microsoft, Oracle, Pivotal, Teradata, Cloudera and others. How does IBM stand out in the big data world?
Picciano: One of the use cases that's unique to IBM is streaming analytics. In a big data world, sometimes the best thing to do is persist your question and have the data run through that question continuously rather than finding a better place to persist the data. Hadoop is, in many ways, just like a different kind of big database. That may be insufficient to differentiate company performance on a variety of different workloads.
Data is becoming a commodity, information is becoming a commodity and even insight is becoming a commodity. What's going to become a differentiator is how fast you can develop that insight. If you have to pour data into a big data lake on Hadoop and then interrogate that information, then you have to figure out, "is this the right day to ask that question?" With streaming analytics you can ask important questions continuously.
IW: Aspirations around the Internet of Things seem to be reinvigorating the complex event processing market. Is this the kind of analysis you're talking about?
Picciano: Yes. If you think about machine-to-machine data and areas like health care and life sciences, we've done some great work with amazing institutions like UCLA and the Hospital for Sick Children in Toronto by analyzing data in motion with IBM InfoSphere Streams. When you look at neonatal care, for example, a nurse typically comes by once an hour and writes down vital signs. That's one chart point, and they'll come back in another hour and so on. But there's so much volatility around blood oxygen levels, heart rates and respiratory rates. By streaming that information and analyzing on a constant basis, you can spot when infants are having what they call spells, which increase their susceptibility to life-threatening infections. In some instances they can also over-oxygenate babies, and when that happens they can go blind.
IW: You hear a lot of talk about real-time applications, but there seem to be far fewer real-world examples. Is real-time really in high demand?
Picciano: There are many other examples. In the telco space, providers are constantly trying to analyze call quality and spot potentially fraudulent activity. They typically do that based on call data records that they load into a warehouse on a daily basis. We're doing it in real time so there's a whole different degree of remediation for customer experience management. We can identify dropped calls and whether they were related to call quality. You can look at the profile of callers, particularly pre-paid callers, and see if they're trying to burn up their minutes. That means they're likely to churn to another carrier, but we've found that there are ways to intercede in those cases and prevent churn.