There's a lot more to Microsoft's big data strategy than a Hadoop partnership with Hortonworks. In fact, Hadoop is just the beginning of what Microsoft Technical Fellow Dave Campbell describes as "an information production line."
You don't hear much about Microsoft in the big data market, but the company wants the world to know it has big plans. In fact, it's living big data through its Bing search engine, Office 365 services and Azure cloud platform. Those businesses have helped Microsoft build deep expertise around analytics and machine learning that the company wants to bring to big data practitioners. It also has a data market on Azure, unsung in-database analytics capabilities and a High-Performance Computing platform that Campbell says will help customers speed the last mile of deep analysis.
Microsoft is putting on a PR offensive this week to try to raise Microsoft's big data profile by way of surveys and case studies. But InformationWeek wanted a closer look at what Microsoft has in the works. In this Q&A interview, Campbell, VP of product development for SQL Server, offers a closer look at Microsoft's big data thinking and what it's working on behind the scenes.
InformationWeek: What's your background and how long have you been in your current role at Microsoft?
Dave Campbell: I've been looking at our whole big data strategy for four years now, and I've been in the database industry for close to 25 years. I can unequivocally state that this is the most exciting period in my career. Seven years ago it was hard to get students excited about databases because it seemed like a solved problem. Then all hell broke loose. As we see it, it's about two opportunities for business. First there's time to insight -- how you can quickly validate or falsify a hypothesis about something. And then there's return on accessible data, with the key term being "accessible."
[ Want more on Redmond's version of Hadoop? Read Microsoft Releases Hadoop On Windows. ]
About two years ago I was talking to executives at one of the U.S. airlines that was in the process of being acquired by another airline. The enterprise architect I was talking to put his head in his hands and said, "Our business is horrible. The airlines are running each other into the ground, and the customers just go to Orbitz or Expedia looking for the lowest price." Then he paused and said, "We've come to realize that the only way we're going to survive is to do a better job of yield management and pricing, do a better job of rescheduling the fleet after a big storm, do a better job of fuel price hedging, and do a better job of upsell than our competitors. That requires us to do new things with data that we don't know how to do."
Success or failure depended on the ability to get an increased return on accessible data. In this case the questions were, "Where are we going to get the fuel futures pricing data?" and "Where are we going to get the meteorological model to know the probability that Logan and JFK are going to be closed tomorrow morning?" I'll come back to that, but today you see so much evidence of data being external to organizations.
IW: What is Microsoft doing to help companies take advantage of external data?
Campbell: One of the things we're working on is this notion of a data market [on Windows Azure]. But it's not just about offering data sets; it's also about analytic models and other things. I'd characterize the last 15 years as being the era of the enterprise mega applications -- the SAPs, PeopleSofts and such. These apps have encouraged data silos. We've gone through several consolidation periods where we need to glue together multiple meaningful applications into suites, but this big data opportunity is way more horizontal. You want to be able to mash up data from your business processes, systems of record, external data, everything. It's not really about the applications and appliances. It's about information production.
I had a conversation recently with executives at a large national health organization and I asked them, "What kind of valuable questions would you want to answer with information that you don't think would belong in your data warehouse?" They looked at me sideways and finally offered that they had GPS telemetry data from their ambulances. Well, can you take data and turn it into patient response times? Can you correlate patient outcomes with those response times? Well then maybe you figure out how much heart attack survival rates improve with response time so you can optimize where you place your ambulances? That analysis might require data on population density and demographics to find the concentrations of people who are most likely to have heart attacks. Their eyes lit up as we started talking about the possibilities.
IW: How does that get back to Microsoft's services and components for big data?
Campbell: Our strategy is about doing a great job of making the information production process easy, helping you to mash up data in different forms and then bring it into the rest of our BI platform. It's about helping people to maximize their return on all accessible data. I pushed Microsoft as much as anyone to adopt Hadoop because it had become a brand, like Kleenex. RFPs said, "what is your Hadoop integration story," not "what is your big data integration story." If customers are going to have hundreds of terabytes or petabytes in Hadoop, that should be seen by us as potential value. But the business value is not at the Hadoop layer, it's how you can turn that into valuable information.
If you've followed our partnership with Hortonworks, you know that we think it's important to domesticate Hadoop by making it easier to install, deploy and manage. That means deploying it with Microsoft Virtual Machine Manager, managing it with Systems Center, and integrating it with Active Directory to make it easy for Microsoft customers. We're working with Hortonworks to do all this as close to the trunk in Apache as we can to make it available for all the distributions.