Open-source Hadoop clusters running on Amazon EC2 promise scalable services for data-intensive computing.
Bringing the Hadoop MapReduce framework to its Elastic Compute Cloud (EC2) environment, Amazon today released a beta version of Amazon Elastic MapReduce. The Web service is aimed at giving analysts and researchers a way to process vast amounts of data more cost effectively.
MapReduce is a framework for using a large number of computer nodes, or clusters, to tackle data-intensive analyses. In the "Map" step, the master node breaks the query up into smaller sub-analyses and distributes these to the many nodes. In the "Reduce" step, the master node consolidates the answers to the sub-analyses and combines them to yield the result. The advantage is faster, massively parallel processing. Amazon's chosen flavor of MapReduce is Apache Hadoop, which is an open-source, Java-based framework.
Amazon says customers can quickly provision as much or as little Elastic MapReduce capacity as required for data-intensive operations such as data mining, financial analysis, scientific simulation, machine learning, log file analysis or bioinformatic research. Amazon EC2 customers including Netflix and eHarmony were quoted in Amazon's press release on Elastic MapReduce.
"MapReduce is a key component of our matching infrastructure," stated eHarmony Vice President of Technology Joseph Essas. "Amazon Elastic MapReduce cuts down on configuration and management time, making the entire process much more efficient."
In conventional deployments, whether running on Hadoop or other MapReduce-based clusters, time-consuming set up, management and tuning are required, according to Amazon. "Some researchers and developers already run Hadoop on Amazon EC2, and many of them have asked for even simpler tools for large-scale data analysis," stated Adam Selipsky, vice president of product management and developer relations for Amazon Web Services. "Amazon Elastic MapReduce makes crunching in the cloud much easier because it dramatically reduces the time, effort, complexity and cost of performing data-intensive tasks."
The service automatically launches and configures the number and type of Amazon EC2 instances specified by customers. To assist customers in executing data-intensive applications, Amazon Web Services is providing a number of MapReduce application samples and tutorials.
Amazon Elastic MapReduce service fees of 1.5 cents to 12 cents per hour are added to the standard Amazon EC2 charges of 10 cents to 80 cents per hour, depending on data volumes. Reserved-instance pricing is also available.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.