Hadoop World NYC Highlights Budding Alternative for Big Data - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
01:20 PM
Connect Directly

Hadoop World NYC Highlights Budding Alternative for Big Data

East Coast event highlights growing, mainstream adoption of open-source software designed for terabyte- to petabyte-scale data processing.

"Our storage footprint tripled between 2007 and 2009... so why wouldn't we consider Hadoop?"

This testimony, shared by Sih Lee of JP Morgan Chase, pretty much sums up the running theme at last week's Hadoop World New York City. We're entering a petabyte era, so organizations of all kinds are looking for new alternatives to handle the 'big data' data processing challenges. (See the influencers and read what they're saying in our accompanying "Hadoop World NYC Image Gallery.")

Hadoop is an open-source software project that was originally based on MapReduce processing principles articulated in a Google white paper published in 2004. The project has since flourished and expanded beyond MapReduce to add subprojects, including the Hadoop Distributed File System (HDFS); Pig data flow language; the HBase distributed, column-oriented database; and the Hive distributed data warehouse.

Web-based companies have led Hadoop adoption, and Yahoo!, Amazon, Facebook and eHarmony executives were on hand at Hadoop World NYC to extol the software's virtues and share details of their deployments. The key point of the event, however, was to highlight and encourage mainstream adoption.

"Hadoop is now everywhere and it's not just for Web companies, it's for all types of companies," stressed Christophe Bisciglia, founder of Cloudera, the Hadoop-focused professional services firm that organized the event.

The testimony of JP Morgan Chase’s Lee helped prove Bisciglia's point about mainstream corporate adoption. Lee, a vice president responsible for "Firmwide Innovation & Shared Services Strategy," said the firm has been exploring Hadoop for more than 18 months. It now has several proof-of-concept projects in the pipeline, seeking cost efficiencies over conventional technologies such as storage area networks, network-attached storage and symmetric multiprocessor hardware.

"Hadoop gives us a cost proposition that is an order of magnitude more cost efficient than some of the competing technologies," he said. "Another driver for considering Hadoop is choice... Having a single-vendor technology lock-in does not help us form a sound strategy overall. The ability to embrace a new technology such as Hadoop gives us another option from which to make sound decisions and choices."

Lee positioned MapReduce and the Hadoop Distributed File System generically as an alternative for petabyte-scale, relatively high-latency data processing, though he declined to detail specific applications at the financial services firm. Offering much more information, Facebook described its Hive-based data warehouse implementation in detail and eHarmony discussed the advantages of cloud-based MapReduce processing in preparation for internal data warehouse analysis.

Cloudera describes Hadoop as a complement to, rather than a replacement of existing systems:

Hadoop is not a database nor does it need to replace any existing data systems you may have. Hadoop augments these systems by offloading the particularly difficult problem of simultaneously ingesting, processing and delivering/exporting large volumes of data so existing systems can focus on what they were designed to do, whether that is serving real-time transactional data or providing interactive business intelligence.

Many Hadoop instances (and certainly most of the largest scale Hadoop instances) are built on homegrown implementations of commodity hardware. A few commercial vendors have embraced Hadoop. Aster Data Systems, for instance, supports both SQL- and Hadoop-based MapReduce, and last week it introduced a connector for separate Hadoop instances (built on Aster or other platforms). Vertica also has a connector for Hadoop-based MapReduce implementations.

Amazon has brought Hadoop-based MapReduce to the cloud through its Elastic MapReduce Web service on EC2, and last week it added support for the Hadoop Hive distributed data warehouse.

Judging by the strong attendance at the event, with some 500 developers and advocates in attendance, it's clear that Hadoop is part of a disruptive wave of technologies emerging for big data problems, and mainframes, conventional storage systems and proprietary data management software will see the brunt of the impact.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
What's Next: AI and Data Trends for 2020 and Beyond
Jessica Davis, Senior Editor, Enterprise Apps,  12/30/2019
White Papers
Register for InformationWeek Newsletters
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Flash Poll