Comments
Will Spark, Google Dataflow Steal Hadoop's Thunder?
Oldest First  |  Newest First  |  Threaded View
Laurianne
100%
0%
Laurianne,
User Rank: Author
6/30/2014 | 1:07:57 PM
Great context
Great context on Spark, Doug. Anyone weighing the MapReduce shortcomings want to chime in here?
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
6/30/2014 | 1:40:36 PM
Re: Great context
MapReduce is clearly being reivented. The tougher question is whether Spark can usurp Hadoop-distributor-favored tools such as Hive, Impala, and other frameworks established and yet to come. There's a danger for Hadoop distributors in not having a piece of the high-value-analytics action.
brunoaziza
50%
50%
brunoaziza,
User Rank: Apprentice
6/30/2014 | 3:11:14 PM
Re: Great context
Great article Doug.  Laurianne - I think the most obvious answer might be speed.  

MapReduce is a great framework but companies want to do analysis at scale struggle to get answers at the 'speed of business'.  

Using Spark, our algorithm ran about 100X faster to give you an idea (we ran about 50M rows in less than 50 seconds).

If you want to know what Spark is or how you can run Machine Learning at scale using Spark, please feel free to read a blog post we authored here

Analytically Yours,

Bruno
Charlie Babcock
100%
0%
Charlie Babcock,
User Rank: Author
6/30/2014 | 8:54:24 PM
Life at Big Data's pinnacle is getting hazardous
Oracle launched version 1 in 1979, IBM's DB2 soon to follow, and relational database has reigned supreme for 30-32 years. Has life at the pinnacle for a data management system, such as Hadoop, shrunk to 10-12 years? I don't believe it. Still  you can see the timeline compression going on, with intense interest followed by thought leaders producing new systems in rapid succession. 
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/1/2014 | 9:14:39 AM
Re: Life at Big Data's pinnacle is getting hazardous
Don't read the emergence of Spark as a death sentence for Hadoop. Spark needs a data platform like Hadoop (or Cassandra or a durable cloud storage option like S3) to run on top of. What it might replace is the managerie of data-analysis and processing tools -- Hadoop MapReduce, Hive, Impala, Mahout, etc. -- that run on top of Hadoop. HDFS, with redundance, high availability, management, and security features, is what remains.
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
7/1/2014 | 9:35:03 AM
Re: Life at Big Data's pinnacle is getting hazardous
I see Spark as a kind of enhancement to Hadoop at higher big data analysis level. It will not kill Hadoop but for sure some changes will happen. Something in Hadoop framework will get deprecated but the foundation will remain.
souravtri
50%
50%
souravtri,
User Rank: Apprentice
7/18/2014 | 7:49:29 AM
Yes, Spark et al is the way forward!
While Hadoop's HDFS is great with (virtuallly infinite) distributed storage, but Hadoop's MapReduce sucks in terms of processing performance and support for easy access to data.

Spark happens to be a great step forward to mitigate above issues with signifincantly improved performance, polygotism, great with SQL (with Shark). It would also be interesting to see hardware advancements (DRAM) which can retain much more data in memory.

My believe, HDFS would continue in usage for storage and incremental improvements  in processing layer (like Spark) would strenghten real-time , fast access to data and analytics.

 

 
BigDataMercs
50%
50%
BigDataMercs,
User Rank: Apprentice
8/23/2014 | 2:22:53 AM
Re: Great context
Be glad to. In short.... 

just my thoughts... 

It's the punchcard analogie to today's in memory, high availability expectations from the business world. They don't give a shit how "cute" it is under the hood... tactical answers... Can DoucheHoop produce? not really. The paradigm has shifted already... 

 

GodSpeed. 
LarsF931
50%
50%
LarsF931,
User Rank: Apprentice
11/25/2014 | 9:29:15 PM
Re: Great context
I think the issue with many dataflow libraries, Spark and Google dataflow included,  is lack of tooling and collaboration aspects.  Newcomers like dataflowanalytics.com are making a splash, allowing users to make performant dataflow apps quickly by leveraging other peoples components.


Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.