Hadoop and the Big-Data Revolution - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
Commentary
10/2/2009
04:31 PM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Hadoop and the Big-Data Revolution

There's a revolution underway in the use of big data, and Hadoop, the open-source distributed computing system, is at the center of it. Apache Hadoop success stories and accolades were shared today by the likes of Yahoo!, Facebook, eHarmony, IBM and JP Morgan Chase at Hadoop World in New York City. Here's a sampling of highlights...

There's a revolution underway in the use of big data, and Hadoop, the open-source distributed computing system, is at the center of it. Apache Hadoop is most often associated with MapReduce data processing, but it also includes a distributed file system and subprojects including the Hive data warehouse. All of the above were at the subject of success stories, accolades and palpable excitement at today's Hadoop World in New York City. Executives from Yahoo!, Facebook, eHarmony, IBM and JP Morgan Chase were here offering insight into how Hadoop is changing expectations for analysis of big data.

Sharing a few highlights from today's presentations, here's what these organizations are doing with Hadoop:

  • Yahoo!, by far the largest developer and contributor to Hadoop, uses it to analyze and improve content optimization, spam filtering, search indexing and ad optimization. Yahoo! has a 4,000-node cluster with 16 petabytes of disk space available for Hadoop analysis, and it has used this infrastructure to sort 1 petabyte of data in 16 hours (across 3,700 nodes) and 1 terabyte of data in 62 seconds (across 1,500 nodes).
  • Facebook is using Hadoop to help analyze the 4 terabytes of compressed new data added to the social networking site each day. Facebook's Hive-based data warehouse runs 7,500 jobs per day for a total of more than 80,000 compute hours. Reporting is a key task, with daily and weekly aggregations of impressions and click counts across the site. Results are reported and explored though MicroStrategy dashboards.

  • eHarmony, the online dating service, is using Hadoop processing and the Hive data warehouse to better understand and more accurately match people among its 20 million registered users.

  • IBM's Emerging Technologies unit has used Hadoop for an experimental mergers-and-acquisitions due-diligence engine. The project compared 1.4 million patent records against fourteen years' worth of Court of Appeals records to spot legal challenges on intellectual property ownership. IBM said the engine has performed in 5 minutes what would otherwise take teams of legal researchers a week to compile.

  • JP Morgan Chase presented here today describing proof-of-concept data warehousing projects that are pursuing "order of magnitude savings" using open-source Hadoop and commodity hardware rather than conventional relational databases and SMP hardware.

The Hadoop World event was presented by Cloudera, a software and professional services firm focused exclusively on Hadoop. The firm announced Cloudera Desktop, a new Web-based, user-friendlier (though still programmer-oriented) interface for Hadoop applications. The Desktop can be used with on-premise implementations of Hadoop or cloud-based instances hosted on Amazon EC2. Amazon executives were also on hand today to discuss use of Amazon Elastic MapReduce, which is a Web services-based implementation built on the Hadoop framework. Amazon announced a partnership whereby customers can specify Cloudera instances within Amazon Elastic MapReduce in order to secure that vendor's professional services and support.

Cloudera founder Christophe Bisciglia opened the day saying that Hadoop is fast becoming pervasive and an increasingly obvious choice not just for Web companies but for all types of companies with big-data challenges and opportunities. Judging by the enthusiasm and numbers of attendees here today (surpassing 500), the big-data revolution has swept out of Silicon Valley and is reaching mainstream corporate data centers.There's a revolution underway in the use of big data, and Hadoop, the open-source distributed computing system, is at the center of it. Apache Hadoop success stories and accolades were shared today by the likes of Yahoo!, Facebook, eHarmony, IBM and JP Morgan Chase at Hadoop World in New York City. Here's a sampling of highlights...

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
AI Regulation: Has the Time Arrived?
John Edwards, Technology Journalist & Author,  2/24/2020
News
Fighting the Coronavirus with Analytics and GIS
Jessica Davis, Senior Editor, Enterprise Apps,  2/3/2020
Slideshows
IT Careers: 10 Job Skills in High Demand This Year
Cynthia Harvey, Freelance Journalist, InformationWeek,  2/3/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
IT Careers: Tech Drives Constant Change
Advances in information technology and management concepts mean that IT professionals must update their skill sets, even their career goals on an almost yearly basis. In this IT Trend Report, experts share advice on how IT pros can keep up with this every-changing job market. Read it today!
Slideshows
Flash Poll