Big Data // Big Data Analytics
News
7/9/2013
03:48 PM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

IBM And Big Data Disruption: Insider's View

IBM's Bob Picciano, general manager of Information Management, talks up five big data use cases, Hadoop-driven change; slams SAP Hana, NoSQL databases.

Disrupting Legacy

IW: You do acknowledge that Hadoop is emerging, but is IBM committed to bringing that platform to enterprises even if it might displace legacy data warehouse workloads?

Picciano: Hadoop will displace not just some aspects of data warehouse work, it will create disruption in the field of ETL as well.

IW: And also mainframe processing. So is IBM really going to champion Hadoop if it might displace data warehousing, ETL and mainframe workloads?

Picciano: Yes, although I would be careful to define the legacy businesses. One of the biggest businesses around the Z mainframe is around Linux and workload consolidation. As we run Hadoop on Linux, there's an opportunity to have that workload in a Z environment. In fact, we've announced the ability to put our BigInsights engine on ZBX, which are Z blades inside of a Z enterprise cluster.

IW: What's the advantage of that approach? Isn't one of the most notable benefits of Hadoop taking advantage of low-cost commodity hardware?

Picciano: It's about handling a diversity of workloads in one environment. If you consider that Z is the system of record in most institutions, why wouldn't they also want to be able to get faster, real-time analytic views into that information? Right now companies have to move that data, on average, 16 times to get it inside a tier where they can do analysis work. We're giving them an option to shorten the synapse between transaction and insight with our IBM DB2 Analytics Accelerator (IDAA).

It makes perfect sense to do that with a data warehouse, and we're having great success where organizations are looking at their Teradata environments in comparison to the efficiency of putting an IDAA on Z. They're saying, why am I sending that data all the way over there to that expensive Teradata system? When you send queries to DB2 when the IDAA is attached, it figures out whether it's more effective to run the query with Z MIPS or whether to run it the IDAA box.

IW: So you're talking about running Hadoop on mainframe, but is that evidence that IBM is willing to disrupt existing business and be an agent of change?

Picciano: If you look at our company's history, especially in the information management space, we started with hierarchical databases but we were the agent of our own change by introducing relational systems. We introduced XML-based systems and object-relational systems. Some of them had more traction than others and some of them fizzled out and never really produced much.

We think there's real value for our clients around Hadoop and data in motion. In some ways that disrupts the data warehousing market in a new way in that you're analyzing in real-time, not in a warehouse. That's very threatening to storage players because you're intelligently determining what patterns are interesting in real time as opposed to just trying to build a bigger repository. We're doing this not because we think it's intellectually stimulating but because it's valuable to customers.

IW: Is there a poster-child customer where mainframe or ETL or DB2 workloads have dramatically changed because IBM is helping them reengineer?

Picciano: General Motors is an example where CIO Randy Mott is transforming and bringing IT back into the company. He's doing that utilizing a Teradata enterprise data warehouse and a new generation of extract-load-transform capabilities using Hadoop as the transformation engine. IBM BigInsights is the Hadoop engine and we're taking our DataStage [data transformation] patterns into Hadoop.

IW: Upstarts are making claims about how big data is changing enterprise architectures. It makes you wonder who's driving the trends.

Picciano: I think IBM is driving, and the reason is this architecture that I've talked about where you have different analytical zones that are really effective at certain aspects of the big data problem. You can't look at it through a purist lens and say, "Hadoop will be able to do all these things," because it just cannot do all those things.

IW: There's clearly a role for multiple technologies. The question is how technology investments will change and how quickly they'll change?

Picciano: Customer value has to be in the center of everyone's cross hairs. It's not a technology experiment or a science project. The use cases that I talked about are where customers are getting additive value because they're analyzing operational data that they couldn't analyze before. They're getting a different view of their clients that wouldn't have been economical to build and so on... When you look at what's required in each of those zones, IBM has a leadership stake in all of those areas and we're putting vigorous investment even into areas that may appear to be most disruptive, like Hadoop.

Previous
3 of 3
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
paulzikopoulos
50%
50%
paulzikopoulos,
User Rank: Apprentice
7/22/2013 | 4:00:05 PM
re: IBM And Big Data Disruption: Insider's View
Sorry @rklopp894, I just realized that I didn't respond to your BTW comment. Mr. Picciano did not say that Netezza can't do under 50 TB at all, in fact, there are loads of Pure Data for Analytic systems (which many will know through the Netezza name) that are below 50TB. Hadoop indeed plays in that PetaByte space as well (and below for that matter) and there is a tight integration between Netezza & Hadoop (not to mention IBM has it's own non-forked distribution called BigInsights which you get a limited use license for free with Netezza). What's more, Netezza lets you execute in-database MapReduce programs which can really bridge the gap for the right applications and provide a unified programming method across the tiers (Netezza and Hadoop).
paulzikopoulos
50%
50%
paulzikopoulos,
User Rank: Apprentice
7/22/2013 | 3:46:37 PM
re: IBM And Big Data Disruption: Insider's View
@Lori Vanourek, please see my response to rklopp894 regarding the inefficient column partition replacement LRU algorithm that Mr. Picciano was referring to. With respect to decompression, you actually call out the difference Mr. Picciano is stating. You say that decompression "is not done until it is already in the CPU cache" And THAT IS the issue, you have to decompress the data when loading into registers from cache so that you can evaluate the query. DB2 with BLU Acceleration doesn't decompress the data. In fact, the data stays compressed and encoded in the registers for predicate evaluation (including range predicates, not just equality) as well as join and aggregate processing. That's the clear advantage that Mr. Picciano is pointing out for DB2.
paulzikopoulos
50%
50%
paulzikopoulos,
User Rank: Apprentice
7/22/2013 | 3:43:42 PM
re: IBM And Big Data Disruption: Insider's View
@rklopp, I think Mr. Picciano's understanding of memory usage is EXACTLY in line with the blog posting you point to. In fact, that blog posting clearly states, "in other words where there is not enough memory to fit all of the vectors in memory even after flushing everything else outGǪ the query fails." That's EXACTLY what Mr. Picciano points out when he talks about how a client might have issues at a Qtr-end close when they start to really stress the system. From what I can tell, and DO correct me (my wife always does, swiftly I may add) if I've read the paper you sent us to wrong, but SAP HANA resorts to an entire column partition as the smallest unit of memory replacement in its LRU algorithm. All other vendors that I know of (including columnar ones that I've looked at) work on a much better block/page level memory replacement algorithm. In today's Big Data world, I just find it unacceptable to require a client to have to fit all their active data into memory; I talk to enough of them that this just doesn't seem to be reality.
Rob Klopp
50%
50%
Rob Klopp,
User Rank: Apprentice
7/12/2013 | 7:36:37 PM
re: IBM And Big Data Disruption: Insider's View
Here is a description of how HANA utilizes memory (http://wp.me/p1a7GL-lo ) to better inform Mr. Picciano. This information is available to IBM via the HANA Blue Book and other resources as they are one of SAP's best partners and very active in the HANA community.

BTW: The surprise to me was that Netezza is the preferred solution for petabyte-sized solutions... but not below 50TB. I do not believe that they have a large footprint in the space above a petabyte... and Hadoop plays somewhere in that petabyte place?
LoriV01
50%
50%
LoriV01,
User Rank: Apprentice
7/11/2013 | 6:35:59 PM
re: IBM And Big Data Disruption: Insider's View
Thank you Doug for your post. For clarification, SAP HANA does not need to decompress data in order to determine whether or not it fits a query. SAP HANA can select and run operations on compressed data. When data needs to be decompressed, it is not done until it is already in the CPU cache. Also, if an SAP HANA system should run scarce on memory, columns (selected by LRU mechanisms) are unloaded from memory down to Data Volume (HANA organized disks), in a manner that leverages database know-how, thus preventing the usual brutal SWAP activities of the OS. Of course, SAP offers scale-out capabilities with the SAP HANA platform so that customers can grow their deployments to multiple nodes, supporting multi-terabyte data sets.
DAVIDINIL
50%
50%
DAVIDINIL,
User Rank: Strategist
7/11/2013 | 5:49:13 PM
re: IBM And Big Data Disruption: Insider's View
Good piece Doug
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/10/2013 | 10:02:01 PM
re: IBM And Big Data Disruption: Insider's View
I was surprised by Picciano's dismissive take on MongoDB and Cassandra.
Oracle seems to be taking NoSQL more seriously, but then, they had Berkeley DB
IP to draw from when they developed the Oracle NoSQL database. I'd note that
MySQL has offered NoSQL data-access options for some time, but that hasn't
curbed the rapid growth of NoSQL databases including Cassandra, Couchbase,
MongoDB,
Riak and others. DB2 may have NoSQL access, but cost, development speed
and, frankly, developer interest in using it for Web and mobile apps
just isn't the same as what we're seeing with new-ear options.

I was also surprised by the idea of running Hadoop on mainframe,
but then, Cray recently put Hadoop on one of its supercomputers. That's not
exactly cheap, commodity hardware.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.