Microsoft announced last week HDInsight Server, a version of the Hadoop big data analytic framework designed to run in simpler, less expensive environments than Hadoop usually requires.
Microsoft characterized the product, now in beta, as an effort to simplify big data implementations and enable IT managers to run Hadoop on Windows machines, without having to jump through the usual hoops.
Apache.org asserts that Hadoop can run on Win32-based machines but recommends doing so only for development, not as a production-quality system.
The on-premises version of HDInsight is fully supported and certified by Microsoft and partners as a production platform. It includes an automated install-and-configuration process and links that enable business users to download subsets of Hadoop data sets so they can run their own scenarios using Excel, PowerPivot for Excel and PowerView.
Microsoft says the HDInsight preview runs on its Windows Server and Azure platform-as-a-service cloud offering. Branded Windows Azure HDInsight Service and Microsoft HDInsight Server for Windows, the Windows versions "dramatically" lower the cost and complexity of deploying Hadoop, according to Microsoft technical fellow David Campbell.
The Hadoop framework is designed to run on implementations as small as one server, but is typically configured to run on a whole server cluster running the open source Apache Web server.
Even installed on just one node, Hadoop must be configured as a cluster within which Hadoop's own name servers and resource managers coordinate the integration of a series of Hadoop modules that manage, schedule, process, query, analyze and publish both data and analytics.
Integration with Microsoft's System Center 2012 is designed to simplify management of both the Windows Server and the Azure versions of HDInsight by allowing IT managers to tweak or control the applications using Microsoft's familiar management tools.
Though HDInsight is fully compatible with Apache Hadoop, according to Microsoft, it is designed to be more adaptable because it can be run either on-premises or in the cloud -- or in both places with connections secured through Active Directory that allow on-premises and cloud versions to exchange data and/or queries.
In addition, having HDInsight running on Windows Azure provides the same dynamic resource configuration as every other cloud service, so admins can install it as if it is running on a single server, then increase RAM, storage, CPU cycles and other resources to cover peaks in demand. They can also expand a virtual single-node version of HDInsight into a multi-node, clustered version without having to reinstall or migrate the installation to a different set of physical servers.
Both also ship with links to the U.S. Census Bureau, United Nations, Dun & Bradstreet and other data sources via the online Windows Azure Marketplace. They also allow data-crunching business users to download subsets of Hadoop data or the results of previous queries to refine, reprint or republish the results using Excel.
Both versions can also be linked with installations of SQL Server to trade data or run cross-queries on both systems, using connectors from Hortonworks, the Hadoop specialist that handled most of the porting and integration of Hadoop onto Windows.