Comments
Will Spark, Google Dataflow Steal Hadoop's Thunder?
Newest First  |  Oldest First  |  Threaded View
BigDataMercs
50%
50%
BigDataMercs,
User Rank: Apprentice
8/23/2014 | 2:22:53 AM
Re: Great context
Be glad to. In short.... 

just my thoughts... 

It's the punchcard analogie to today's in memory, high availability expectations from the business world. They don't give a shit how "cute" it is under the hood... tactical answers... Can DoucheHoop produce? not really. The paradigm has shifted already... 

 

GodSpeed. 
souravtri
50%
50%
souravtri,
User Rank: Apprentice
7/18/2014 | 7:49:29 AM
Yes, Spark et al is the way forward!
While Hadoop's HDFS is great with (virtuallly infinite) distributed storage, but Hadoop's MapReduce sucks in terms of processing performance and support for easy access to data.

Spark happens to be a great step forward to mitigate above issues with signifincantly improved performance, polygotism, great with SQL (with Shark). It would also be interesting to see hardware advancements (DRAM) which can retain much more data in memory.

My believe, HDFS would continue in usage for storage and incremental improvements  in processing layer (like Spark) would strenghten real-time , fast access to data and analytics.

 

 
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
7/1/2014 | 9:35:03 AM
Re: Life at Big Data's pinnacle is getting hazardous
I see Spark as a kind of enhancement to Hadoop at higher big data analysis level. It will not kill Hadoop but for sure some changes will happen. Something in Hadoop framework will get deprecated but the foundation will remain.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/1/2014 | 9:14:39 AM
Re: Life at Big Data's pinnacle is getting hazardous
Don't read the emergence of Spark as a death sentence for Hadoop. Spark needs a data platform like Hadoop (or Cassandra or a durable cloud storage option like S3) to run on top of. What it might replace is the managerie of data-analysis and processing tools -- Hadoop MapReduce, Hive, Impala, Mahout, etc. -- that run on top of Hadoop. HDFS, with redundance, high availability, management, and security features, is what remains.
Charlie Babcock
100%
0%
Charlie Babcock,
User Rank: Author
6/30/2014 | 8:54:24 PM
Life at Big Data's pinnacle is getting hazardous
Oracle launched version 1 in 1979, IBM's DB2 soon to follow, and relational database has reigned supreme for 30-32 years. Has life at the pinnacle for a data management system, such as Hadoop, shrunk to 10-12 years? I don't believe it. Still  you can see the timeline compression going on, with intense interest followed by thought leaders producing new systems in rapid succession. 
brunoaziza
50%
50%
brunoaziza,
User Rank: Apprentice
6/30/2014 | 3:11:14 PM
Re: Great context
Great article Doug.  Laurianne - I think the most obvious answer might be speed.  

MapReduce is a great framework but companies want to do analysis at scale struggle to get answers at the 'speed of business'.  

Using Spark, our algorithm ran about 100X faster to give you an idea (we ran about 50M rows in less than 50 seconds).

If you want to know what Spark is or how you can run Machine Learning at scale using Spark, please feel free to read a blog post we authored here

Analytically Yours,

Bruno
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
6/30/2014 | 1:40:36 PM
Re: Great context
MapReduce is clearly being reivented. The tougher question is whether Spark can usurp Hadoop-distributor-favored tools such as Hive, Impala, and other frameworks established and yet to come. There's a danger for Hadoop distributors in not having a piece of the high-value-analytics action.
Laurianne
100%
0%
Laurianne,
User Rank: Author
6/30/2014 | 1:07:57 PM
Great context
Great context on Spark, Doug. Anyone weighing the MapReduce shortcomings want to chime in here?


IT's Reputation: What the Data Says
IT's Reputation: What the Data Says
InformationWeek's IT Perception Survey seeks to quantify how IT thinks it's doing versus how the business really views IT's performance in delivering services - and, more important, powering innovation. Our results suggest IT leaders should worry less about whether they're getting enough resources and more about the relationships they have with business unit peers.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 24, 2014
Start improving branch office support by tapping public and private cloud resources to boost performance, increase worker productivity, and cut costs.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.