Comments
Will Spark, Google Dataflow Steal Hadoop's Thunder?
Threaded  |  Newest First  |  Oldest First
Laurianne
50%
50%
Laurianne,
User Rank: Author
6/30/2014 | 1:07:57 PM
Great context
Great context on Spark, Doug. Anyone weighing the MapReduce shortcomings want to chime in here?
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
6/30/2014 | 1:40:36 PM
Re: Great context
MapReduce is clearly being reivented. The tougher question is whether Spark can usurp Hadoop-distributor-favored tools such as Hive, Impala, and other frameworks established and yet to come. There's a danger for Hadoop distributors in not having a piece of the high-value-analytics action.
brunoaziza
50%
50%
brunoaziza,
User Rank: Apprentice
6/30/2014 | 3:11:14 PM
Re: Great context
Great article Doug.  Laurianne - I think the most obvious answer might be speed.  

MapReduce is a great framework but companies want to do analysis at scale struggle to get answers at the 'speed of business'.  

Using Spark, our algorithm ran about 100X faster to give you an idea (we ran about 50M rows in less than 50 seconds).

If you want to know what Spark is or how you can run Machine Learning at scale using Spark, please feel free to read a blog post we authored here

Analytically Yours,

Bruno
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/30/2014 | 8:54:24 PM
Life at Big Data's pinnacle is getting hazardous
Oracle launched version 1 in 1979, IBM's DB2 soon to follow, and relational database has reigned supreme for 30-32 years. Has life at the pinnacle for a data management system, such as Hadoop, shrunk to 10-12 years? I don't believe it. Still  you can see the timeline compression going on, with intense interest followed by thought leaders producing new systems in rapid succession. 
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/1/2014 | 9:14:39 AM
Re: Life at Big Data's pinnacle is getting hazardous
Don't read the emergence of Spark as a death sentence for Hadoop. Spark needs a data platform like Hadoop (or Cassandra or a durable cloud storage option like S3) to run on top of. What it might replace is the managerie of data-analysis and processing tools -- Hadoop MapReduce, Hive, Impala, Mahout, etc. -- that run on top of Hadoop. HDFS, with redundance, high availability, management, and security features, is what remains.
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
7/1/2014 | 9:35:03 AM
Re: Life at Big Data's pinnacle is getting hazardous
I see Spark as a kind of enhancement to Hadoop at higher big data analysis level. It will not kill Hadoop but for sure some changes will happen. Something in Hadoop framework will get deprecated but the foundation will remain.
souravtri
50%
50%
souravtri,
User Rank: Apprentice
7/18/2014 | 7:49:29 AM
Yes, Spark et al is the way forward!
While Hadoop's HDFS is great with (virtuallly infinite) distributed storage, but Hadoop's MapReduce sucks in terms of processing performance and support for easy access to data.

Spark happens to be a great step forward to mitigate above issues with signifincantly improved performance, polygotism, great with SQL (with Shark). It would also be interesting to see hardware advancements (DRAM) which can retain much more data in memory.

My believe, HDFS would continue in usage for storage and incremental improvements  in processing layer (like Spark) would strenghten real-time , fast access to data and analytics.

 

 


The Business of Going Digital
The Business of Going Digital
Digital business isn't about changing code; it's about changing what legacy sales, distribution, customer service, and product groups do in the new digital age. It's about bringing big data analytics, mobile, social, marketing automation, cloud computing, and the app economy together to launch new products and services. We're seeing new titles in this digital revolution, new responsibilities, new business models, and major shifts in technology spending.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.