Open-source Hadoop clusters running on Amazon EC2 promise scalable services for data-intensive computing.
Bringing the Hadoop MapReduce framework to its Elastic Compute Cloud (EC2) environment, Amazon today released a beta version of Amazon Elastic MapReduce. The Web service is aimed at giving analysts and researchers a way to process vast amounts of data more cost effectively.
MapReduce is a framework for using a large number of computer nodes, or clusters, to tackle data-intensive analyses. In the "Map" step, the master node breaks the query up into smaller sub-analyses and distributes these to the many nodes. In the "Reduce" step, the master node consolidates the answers to the sub-analyses and combines them to yield the result. The advantage is faster, massively parallel processing. Amazon's chosen flavor of MapReduce is Apache Hadoop, which is an open-source, Java-based framework.
Amazon says customers can quickly provision as much or as little Elastic MapReduce capacity as required for data-intensive operations such as data mining, financial analysis, scientific simulation, machine learning, log file analysis or bioinformatic research. Amazon EC2 customers including Netflix and eHarmony were quoted in Amazon's press release on Elastic MapReduce.
"MapReduce is a key component of our matching infrastructure," stated eHarmony Vice President of Technology Joseph Essas. "Amazon Elastic MapReduce cuts down on configuration and management time, making the entire process much more efficient."
In conventional deployments, whether running on Hadoop or other MapReduce-based clusters, time-consuming set up, management and tuning are required, according to Amazon. "Some researchers and developers already run Hadoop on Amazon EC2, and many of them have asked for even simpler tools for large-scale data analysis," stated Adam Selipsky, vice president of product management and developer relations for Amazon Web Services. "Amazon Elastic MapReduce makes crunching in the cloud much easier because it dramatically reduces the time, effort, complexity and cost of performing data-intensive tasks."
The service automatically launches and configures the number and type of Amazon EC2 instances specified by customers. To assist customers in executing data-intensive applications, Amazon Web Services is providing a number of MapReduce application samples and tutorials.
Amazon Elastic MapReduce service fees of 1.5 cents to 12 cents per hour are added to the standard Amazon EC2 charges of 10 cents to 80 cents per hour, depending on data volumes. Reserved-instance pricing is also available.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.