Microsoft's CEO says an inside-the-enterprise business intelligence focus misses the real opportunity for large-scale insight.
Microsoft is doubling down on "big data," one of the tech trends that's emerging as a top priority for its customers, says Steve Ballmer. But Microsoft's CEO isn't talking about the kind of internal business intelligence that he says are the focus for IBM, Oracle and other rivals.
In a day of interviews with four InformationWeek editors at Microsoft's headquarters last week, Ballmer and several of his lieutenants provided an expansive vision for what big data and the related cloud computing movement will bring. It remains to be seen whether Microsoft can translate that vision into an advantage over rivals IBM and Oracle, or, more importantly, into real value for its customers. But we heard compelling arguments for blending on-premises data and computing capacity with new resources and capabilities in the cloud.
If you think about big data through the narrow lens of large-scale data warehousing, Microsoft is the greenhorn among the likes of EMC (by way of its Greenplum acquisition), Hewlett-Packard (through its Vertica acquisition), IBM, Oracle, and Teradata. Those vendors have fielded products for the top end of the market for years, while Microsoft didn't introduce its SQL Server R2 Parallel Data Warehouse (PDW) database until late last year. Hardware-complete PDW appliances from HP and other partners weren't available until early this year.
In fact, Microsoft didn't win its first PDW customer, the Direct Edge stock exchange, until last month, as I reported in this article. At 30-plus-terabytes, the Direct Edge project is larger than any we've seen on Oracle's Exadata. (BNP Paribas's deployment, which started at 23 terabytes, is the largest Exadata reference customer we know of).
The Direct Edge deployment won't be operational until later this year. And even at its zenith, this project won't hold a candle to the petabyte-scale deployments running on Greenplum, Netezza, and Teradata. Direct Edge says its deployment might scale up to about 200 terabytes.
So just where was Ballmer coming from when he said, "Nobody plays in big data, really, except Microsoft and Google"?
Search And Big Insight
Ballmer's perspective on big data is tied to the Bing Internet search engine, a business we heard much more about from Satya Nadella. Until a few months ago, Nadella was the senior VP in charge of engineering for Microsoft's online business, which includes search (Bing), the MSN portal, and Internet ad-serving. It says something that Nadella was Ballmer's hand-picked choice to take over as president of Microsoft's Server and Tools division in January, following the resignation of Bob Muglia.
Nadella has been at Microsoft since 1992, serving in Microsoft Business Solutions (responsible for Microsoft Dynamics applications) and on the server side of the business (working on Windows NT and other server products). During his four-plus years with Microsoft's online business, Nadella says he "relearned everything about infrastructure," something Microsoft's server business needs to do as it moves into cloud computing.
Microsoft's online operation puts big data into perspective. Bing's infrastructure is comprised of 250,000 Windows Server machines and manages some 150 petabytes of data. Microsoft processes two to three petabytes per day. "You really have to figure out how to process that kind of data to keep your index fresh," Nardella says.
Those interested in running apps in the cloud might dismiss Bing-related processing as being stateless -- not a continuously running component of a mission-critical app. Nadella points to Microsoft's AdCenter, which is a complicated business application with a transactional data store. All Internet ad deliveries have to be tracked, and for every search, Microsoft runs some 30,000 auctions simultaneously to re-rank the ads. "That's as stateful an app as you can get," Nadella says.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.