Formerly code named Denali, Microsoft's next SQL Server release has been discussed and available as a community technology preview for several months. The company announced at the PASS Summit 2011 event that the database platform will become generally available in the first half of next year. Again, that could have been predicted.
Microsoft's public embrace of Hadoop comes just one week after rival Oracle announced that it, too, will release a distribution of Hadoop and will put the software on a Big Data Appliance built on Oracle Sun hardware. EMC announced a distribution of Hadoop in May and it followed up last month by announcing a modular appliance than can run the Greenplum database and Hadoop on the same platform.
Interest in Hadoop is driven primarily by the need to handle large volumes of loosely or inconsistently structured data such as social network feeds, Web logs, email, documents, and other text-centric information. These data types can be used for applications such as customer sentiment analysis, but they cannot be effectively managed in a relational database such as SQL Server, Oracle Database, or IBM's DB2.
[ Want more on Oracle's big data plans? Read Oracle's Big Plans For Big Data Analysis. ]
"We're seeing significant changes in the data landscape, with businesses encountering more types of data--more shapes, more sizes--than ever before; to address those changes we need a new data platform," Doug Leland, general manager of product management for SQL Server, said in an interview with InformationWeek.
Microsoft will support Hadoop with an Apache-derivative distribution that will run as a service on the company's Azure cloud platform and an on-premises release that will run on Windows Server. The Azure service will debut in beta by the end of this year while the software release will follow in 2012, though the company didn't specify which quarter or even which half of next year.
Running on Windows will be a new trick for an open source platform that has heretofore run on Linux. Will Microsoft's release be free and open source? That has yet to be announced, Leland said, and there was no word on whether there would be supporting appliances on third-party hardware, as there are for the SQL Server Parallel Data Warehouse.
Leland did note that the software will be "consistent and compatible with the Apache Hadoop core." He also noted Microsoft has partnered with Hortonworks, a Yahoo Spinoff that specializes in Hadoop, to help develop the software distributions and propose contributions back to the Hadoop community.
Microsoft will give customers several ways to exploit data from Hadoop. Available immediately will be final versions of previously announced Hadoop Connectors for SQL Server and the SQL Server Parallel Data Warehouse. These connectors will enable data to be passed between SQL and Hadoop, but data is more likely to be passed from Hadoop into SQL, so the results of big-data processing jobs on Hadoop can be analyzed with familiar SQL analysis tools.
The coming software distributions (including the beta Azure service due out by year end) will add a Hive ODBC driver that will enable customers to use Microsoft's familiar business intelligence (BI) tools to analyze data directly within Hadoop. Hive is the Apache query and analysis tool for Hadoop.
SQL Server's BI capabilities will be enhanced significantly in the 2012 release by the addition of Power View, formerly code named "Crescent." Microsoft Senior Vice President Ted Kummert was expected to demonstrate the Power View data exploration and visualization capabilities on Wednesday on Apple iOS devices, including the iPad. The Power View touch capabilities won't be available until the end of 2012, by which time there just might be Windows 8-based tablet competitors. But given iPad's tablet dominance and the current lack of credible competition running on Windows, supporting iOS was a long-overdue choice Microsoft had to make.
To complement the Windows Azure Data MarketPlace, Microsoft also demonstrated a new Data Explorer tool that will make it easier to browse and use data from public-cloud data sources. The MarketPlace is accessible in 26 countries, and it offers hundreds of data sets including financial data, demographic data, and geospatial data. Data Explorer includes data-visualization components for browsing data and extract-transform-load capabilities for enhancing your own data with purchased data. Resulting new data sets can also be uploaded back to the MarketPlace for sale or free distribution.
Microsoft, like Oracle, has given itself lots of time to deliver software and services for on-premises Hadoop deployments. Microsoft's commitment to have a beta Hadoop service up and running on Azure by year end is a bit more exciting (although such services are already available on Amazon's cloud). The Hive ODBC tool and Hadoop connectors for SQL Server promise to make Hadoop accessible. It's likely that 99% of SQL Server customers will be more interested in Data Power and conventional database capabilities in the near term. But with two database giants now embracing Hadoop, it's very clear unstructured data processing and analysis will eventually go mainstream.