Comments
IBM And Big Data Disruption: Insider's View
Newest First  |  Oldest First  |  Threaded View
paulzikopoulos
50%
50%
paulzikopoulos,
User Rank: Apprentice
7/22/2013 | 4:00:05 PM
re: IBM And Big Data Disruption: Insider's View
Sorry @rklopp894, I just realized that I didn't respond to your BTW comment. Mr. Picciano did not say that Netezza can't do under 50 TB at all, in fact, there are loads of Pure Data for Analytic systems (which many will know through the Netezza name) that are below 50TB. Hadoop indeed plays in that PetaByte space as well (and below for that matter) and there is a tight integration between Netezza & Hadoop (not to mention IBM has it's own non-forked distribution called BigInsights which you get a limited use license for free with Netezza). What's more, Netezza lets you execute in-database MapReduce programs which can really bridge the gap for the right applications and provide a unified programming method across the tiers (Netezza and Hadoop).
paulzikopoulos
50%
50%
paulzikopoulos,
User Rank: Apprentice
7/22/2013 | 3:46:37 PM
re: IBM And Big Data Disruption: Insider's View
@Lori Vanourek, please see my response to rklopp894 regarding the inefficient column partition replacement LRU algorithm that Mr. Picciano was referring to. With respect to decompression, you actually call out the difference Mr. Picciano is stating. You say that decompression "is not done until it is already in the CPU cache" And THAT IS the issue, you have to decompress the data when loading into registers from cache so that you can evaluate the query. DB2 with BLU Acceleration doesn't decompress the data. In fact, the data stays compressed and encoded in the registers for predicate evaluation (including range predicates, not just equality) as well as join and aggregate processing. That's the clear advantage that Mr. Picciano is pointing out for DB2.
paulzikopoulos
50%
50%
paulzikopoulos,
User Rank: Apprentice
7/22/2013 | 3:43:42 PM
re: IBM And Big Data Disruption: Insider's View
@rklopp, I think Mr. Picciano's understanding of memory usage is EXACTLY in line with the blog posting you point to. In fact, that blog posting clearly states, "in other words where there is not enough memory to fit all of the vectors in memory even after flushing everything else outGǪ the query fails." That's EXACTLY what Mr. Picciano points out when he talks about how a client might have issues at a Qtr-end close when they start to really stress the system. From what I can tell, and DO correct me (my wife always does, swiftly I may add) if I've read the paper you sent us to wrong, but SAP HANA resorts to an entire column partition as the smallest unit of memory replacement in its LRU algorithm. All other vendors that I know of (including columnar ones that I've looked at) work on a much better block/page level memory replacement algorithm. In today's Big Data world, I just find it unacceptable to require a client to have to fit all their active data into memory; I talk to enough of them that this just doesn't seem to be reality.
Rob Klopp
50%
50%
Rob Klopp,
User Rank: Apprentice
7/12/2013 | 7:36:37 PM
re: IBM And Big Data Disruption: Insider's View
Here is a description of how HANA utilizes memory (http://wp.me/p1a7GL-lo ) to better inform Mr. Picciano. This information is available to IBM via the HANA Blue Book and other resources as they are one of SAP's best partners and very active in the HANA community.

BTW: The surprise to me was that Netezza is the preferred solution for petabyte-sized solutions... but not below 50TB. I do not believe that they have a large footprint in the space above a petabyte... and Hadoop plays somewhere in that petabyte place?
LoriV01
50%
50%
LoriV01,
User Rank: Apprentice
7/11/2013 | 6:35:59 PM
re: IBM And Big Data Disruption: Insider's View
Thank you Doug for your post. For clarification, SAP HANA does not need to decompress data in order to determine whether or not it fits a query. SAP HANA can select and run operations on compressed data. When data needs to be decompressed, it is not done until it is already in the CPU cache. Also, if an SAP HANA system should run scarce on memory, columns (selected by LRU mechanisms) are unloaded from memory down to Data Volume (HANA organized disks), in a manner that leverages database know-how, thus preventing the usual brutal SWAP activities of the OS. Of course, SAP offers scale-out capabilities with the SAP HANA platform so that customers can grow their deployments to multiple nodes, supporting multi-terabyte data sets.
DAVIDINIL
50%
50%
DAVIDINIL,
User Rank: Strategist
7/11/2013 | 5:49:13 PM
re: IBM And Big Data Disruption: Insider's View
Good piece Doug
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/10/2013 | 10:02:01 PM
re: IBM And Big Data Disruption: Insider's View
I was surprised by Picciano's dismissive take on MongoDB and Cassandra.
Oracle seems to be taking NoSQL more seriously, but then, they had Berkeley DB
IP to draw from when they developed the Oracle NoSQL database. I'd note that
MySQL has offered NoSQL data-access options for some time, but that hasn't
curbed the rapid growth of NoSQL databases including Cassandra, Couchbase,
MongoDB,
Riak and others. DB2 may have NoSQL access, but cost, development speed
and, frankly, developer interest in using it for Web and mobile apps
just isn't the same as what we're seeing with new-ear options.

I was also surprised by the idea of running Hadoop on mainframe,
but then, Cray recently put Hadoop on one of its supercomputers. That's not
exactly cheap, commodity hardware.


Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Nov. 10, 2014
Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 16, 2014.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.