Hadoop Ecosystem Evolves: 10 Cool Big Data Projects - InformationWeek
IoT
IoT
Data Management // Software Platforms
News
2/26/2016
10:06 AM
Jessica Davis
Jessica Davis
Slideshows
Connect Directly
Twitter
RSS
E-Mail
50%
50%
RELATED EVENTS
The Real Impact of a Data Security Breach
Aug 02, 2017
In this webcast, experts discuss the real losses associated with a breach, both in the data center ...Read More>>

Hadoop Ecosystem Evolves: 10 Cool Big Data Projects

In the 10 years since developers created Hadoop to wrangle the challenges that came with big data, the ecosystem for these technologies has evolved. The Apache Software Foundation is teeming with open source big data technology projects. Here's a look at some significant projects, and a peek at some up-and-comers.
Previous
1 of 11
Next

(Image: Mikko Lemola/iStockphoto)

(Image: Mikko Lemola/iStockphoto)

Managing and analyzing big data -- the exponentially growing body of information collected from social media, sensors attached to "things" in the Internet of Things (IoT), structured data, unstructured data, and everything else that can be collected -- has become a massive challenge. To tackle the task, developers have created a new set of open source technologies.

The flagship software, Apache Hadoop, an Apache Software Foundation project, celebrated its 10th anniversary last month. A lot has happened in those 10 years. Many other technologies are now also a part of the big data and Hadoop ecosystem, mostly within the Apache Software Foundation, too.

Spark, Hive, HBase, and Storm are among the options developers and organizations are using to create big data technologies and contribute them to the open source community for further development and adoption.

Some of these technologies are in production at enterprises such as Netflix and LinkedIn. They enable organizations to work with massive amounts of data in real time and turn that data around to improve services for end customers.

[Want to learn more about Hadoop? Read Hadoop At 10: Milestones And Momentum.]

These big data technologies often are born within organizations that are trying to enhance the way in which big data technologies work and improve their speed. They represent an evolution of the ecosystem, and the next wave of open source technology, which proves that development by a community of smart participants can be better than development within a propriety corporate environment.

This modern era of open source and big data all started with Hadoop, most often described as an open source framework for distributed storage and processing of large sets of data on commodity hardware.

"Hadoop created this center of gravity for a new data architecture to emerge," Shaun Connolly, VP of corporate strategy at Hadoop distribution company Hortonworks, told InformationWeek in an interview. "Hadoop has this ecosystem of interesting projects that have grown up around it."

And the evolution continues. New projects are accepted into the Apache Software Foundation's big data ecosystem all the time. Most recently, Apache Arrow became a Top-Level Project. Other projects may enter the ecosystem as part of the Apache Software Foundation's Incubator. IBM's SystemML machine learning engine for Spark gained acceptance as an Incubator project late last year.

There are many projects that are part of the Apache Software Foundation's big data ecosystem. Here's a look at some of the significant ones, and a peek at a few up-and-comers. Once you've reviewed our choices, let us know what you think in the comments section below. Are there any you prefer? Are there some we've missed? We'd love to hear from you.

Rising stars wanted. Are you an IT professional under age 30 who's making a major contribution to the field? Do you know someone who fits that description? Submit your entry now for InformationWeek's Pearl Award. Full details and a submission form can be found here.

Jessica Davis has spent a career covering the intersection of business and technology at titles including IDG's Infoworld, Ziff Davis Enterprise's eWeek and Channel Insider, and Penton Technology's MSPmentor. She's passionate about the practical use of business intelligence, ... View Full Bio

Previous
1 of 11
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
HM
50%
50%
HM,
User Rank: Moderator
2/29/2016 | 4:10:26 PM
Big data
Jessica, very insightful article. Another open source technology to mention is HPCC Systems from LexisNexis, a data-intensive super computing platform for processing and solving big data analytical problems. Its integration with Hadoop extends further capabilities providing a complete solution for data ingestion, processing and delivery. In fact, both libhdfs and webhdfs implementations are available. More at http://hpccsystems.com/h2h
pfretty
100%
0%
pfretty,
User Rank: Ninja
2/29/2016 | 1:55:37 AM
Evolution becomes even more valuable with augmentation
Hadoop is amazing technology with the ability process in a cost effective manner. However, its when organizations embrace augmentation tools that the real value surfaces.  This was echoed in a recent IDG Research Services study where IT leaders identified key benefits of embracing augmentation tools. 

 

Peter Fretty, IDG blogger for SAS
nasimson
50%
50%
nasimson,
User Rank: Ninja
2/28/2016 | 7:23:14 PM
Such mammoth projects
From Spark to so many other things, this list is awesome. I didn't know that Hadoop was the underlying framework behind such mammoth projects. Seems like an impressive portfolio for a ten year old tech.
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Strategies to Conquer the Cloud
Chances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.
Video
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll