Microsoft drops LINQ to HPC big-data processing project, in favor of Apache's open source Hadoop platform.
Just one month after announcing plans to release Windows Azure- and Windows Server-based implementations of open source Apache Hadoop, Microsoft quietly changed course on LINQ to Windows High Performance Computing (HPC), codenamed "Dryad."
Microsoft has been working on Dryad for more than five years, but a November 11 post on the Windows HPC Team Blog revealed that the project has been discontinued before ever seeing a production release. The move to drop Dryad was first reported Wednesday by Microsoft-watcher Mary Jo Foley.
Dryad was intended to run big-data jobs across HPC, Microsoft's clustered server environment. But such a release would have presented a proprietary and competing alternative to Hadoop, which is rapidly emerging as the leading platform for distributed data processing.
As InformationWeek reported October 12, Microsoft's embrace of Hadoop follows in the footsteps of EMC and IBM, which each introduced their own distributions of the software this year. And on October 3, Oracle announced plans for yet another Hadoop release, this one expected next year along with a supporting Oracle Big Data Appliance.
In the face of this growing support, a single-vendor platform like Dryad faced long odds of success. "Hadoop has emerged as a great platform for analyzing unstructured data or large volumes of data at low cost," acknowledged Don Pattee, senior program manager, on the HPC Team Blog. "It also has a vibrant community of users and developers eager to innovate on this platform."
Microsoft has partnered with Hortonworks, a Yahoo! spinoff that specializes in Hadoop, to help develop software distributions for Windows Server and Windows Azure. A preview version is expected to appear on the Azure cloud computing platform by the end of this year.
Interest in Hadoop is driven by demands for high scalability (into the petabytes) at low cost, and the need to flexibly handle loosely structured data such as social network feeds, email, and documents or inconsistently structured data from clickstreams, Web logs, and sensors. These data types cannot be effectively managed in a relational database such as Microsoft SQL Server.
"We're seeing significant changes in the data landscape, with businesses encountering more types of data--more shapes, more sizes--than ever before," Doug Leland, general manager of product management for SQL Server, told InformationWeek. "To address those changes we need a new data platform."
Microsoft's software direction is now clear, but there has been no indication whether there will be supporting appliances on third-party hardware, as there are for the SQL Server Parallel Data Warehouse (PDW.) Hewlett-Packard offers a PDW appliance in partnership with Microsoft and it's also an HPC partner. It's a safe bet that Microsoft will tap HP and possibly others to develop turnkey appliances to answer EMC's Greenplum Modular Data Computing Appliance, released in September, and Oracle's planned Big Data Appliance.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.