Microsoft Brings Storm Stream Analysis To Hadoop

Microsoft Azure HD Insight adds Apache Storm real-time data processing capabilities. Machine Learning service gains free app templates.

Doug Henschen, Executive Editor, Enterprise Apps

October 16, 2014

3 Min Read

IT Dress Code: 10 Cardinal Sins

IT Dress Code: 10 Cardinal Sins


IT Dress Code: 10 Cardinal Sins (Click image for larger view and slideshow.)

Microsoft on Wednesday announced three notable upgrades to its big data offerings, all of which will give Azure customers new options for cloud deployment and data-analysis applications.

First, a new release of Microsoft's Azure-based HD Insight service incorporates Apache Storm capabilities for streaming-data analysis. Second, Microsoft is throwing in free templates for predictive applications with its Azure Machine Learning (Azure ML) service. Third, the company introduced new options to deploy and back up Hortonworks Hadoop clusters on Azure.

The embrace of Apache Storm, offered initially as public-preview support for running Storm clusters on the HD Insight service, answers the call for real-time monitoring and analysis of streaming data. Potential applications include Web clickstream analysis and Internet-of-Things style apps such as predictive maintenance, according to Eron Kelly, Microsoft's GM of product marketing, SQL Server product management.

[How can you use Azure ML? Read Microsoft Azure Machine Learning: Pier 1 Digs In.]

"If I want to see real-time analysis of what's happening on my website instead of batch-process analysis of clickstreams using MapReduce, Storm is a good choice," Kelly said in a phone interview with InformationWeek. Pier1 is a strategic partner helping Microsoft to develop use cases for Storm on HD Insight, he said.

Originally developed at Twitter, Storm recently became a top-level Apache project with support and encouragement from contributors including Yahoo. Storm competes to some extent with Apache Spark, though the latter supports streaming data analysis as well as in-memory machine learning, SQL, graph, and R analytics.

Spark is best known for its in-memory machine learning capabilities, but Microsoft has ambitions of its own in this area via the Azure Machine Learning service, which could explain why it has been slower to embrace Spark.

Introduced in June, Azure ML is being enhanced with free application templates for building recommendation engines, analysis engines that spot products that are frequently purchased together, and anomaly detection engines for Internet-of-Things type apps like predictive maintenance.

"This is taking the Azure ML algorithms and constructing them into higher-level, lightweight applications," Kelly explained.

Once assembled and combined with customer data, these recommendation, purchased-together, and anomaly-detection services can be embedded in websites, call-center interfaces, or other applications to trigger predictive actions and recommendations.

It's a bit of a surprise that the Hortonworks Data Platform (HDP) wasn't already certified to run on Azure. Previously you could bring your own deployment images to Azure, but with the certification of HDP 2.2 announced this week, customers will be able to choose certified Hortonworks HDP deployment images from the Azure Gallery. This will let customers quickly deploy Hortonworks clusters on Azure (whether as standalone deployments or as part of hybrid deployments mixed with on-premises capacity). Another option is backing up on-premises clusters on Azure.

"Backing up on-premises HDP clusters to Azure offers a low-cost storage option in the cloud," said Kelly. "It also lets you enhance your information with third-party data sets available on Azure, and you can use Azure ML and Power BI for analysis."

It has been one year since Microsoft introduced HD Insight. The company has since released Azure ML and offered a preview of Azure DocumentDB, a document-database service that's expected to compete against MongoDB.

It's clear that Microsoft is focused on delivering the most popular types of platforms used for big data (Hadoop and a JSON-savvy NoSQL document database). Analytical options on top -- like Azure ML and the new Storm streaming-analysis service -- are the icing on the cake.

It doesn't matter whether your e-commerce D-Day is Black Friday, tax day, or some random Thursday when a post goes viral. Your websites need to be ready. Get the new Battle-Tested Websites issue of InformationWeek Tech Digest today. (Free registration required.)

About the Author

Doug Henschen

Executive Editor, Enterprise Apps

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of Transform Magazine, and Executive Editor at DM News. He has covered IT and data-driven marketing for more than 15 years.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights