What's The Best Path To Big Data Insight? - InformationWeek
Data Management // Big Data Analytics
11:10 AM
Doug Henschen
Doug Henschen
Connect Directly
How Cloud Can Streamline Business Workflow
Jul 11, 2017
In order to optimize your utilization of cloud computing, you need to be able to deliver reliable ...Read More>>

What's The Best Path To Big Data Insight?

We sort through this week's clashing din of news from Teradata, SAS, Pivotal, Platfora and Hortonworks in search of the inside edge to big data breakthroughs.

Pivotal Plays Enterprise Card

A future-minded spinoff of EMC, Pivotal blends cloud, application development and big data capabilities. The "cloud fabric" is based on Cloud Foundry platform-as-a-service (PaaS) software and expertise from VMware. The application-development expertise and technology comes from Pivotal Labs, contributed by EMC, and VMware's SpringSource unit. The big data and analytics capabilities blend Hadoop and EMC's Greenplum database.

The combination of Hadoop and Greenplum produced HAWQ (Hadoop with SQL), an SQL-on-Hadoop querying capability that's part of the company's Pivotal HD Hadoop distribution. This is an alternative that Pivotal says far surpasses both Hadoop's Hive component and Cloudera's Impala in performance.

Pivotal also has VMware's GemFire in-memory caching technology, which has been integrated with the Pivotal HD Hadoop distribution and introduced this week as Pivotal GemFire XD. The goal is to bring real-time, in-memory data services to Hadoop.

[ Want more on Teradata's latest big data analysis option? Read Teradata Brings Graph Analysis To SQL. ]

Pivotal freely admits the GemFire XD overlaps with Hadoop community offerings including HBase (the NoSQL database) and Spark (the Hadoop-based in-memory option), but it insists that customers are free to choose whatever components they want.

"The community is investing in technologies, and more often than not, they are good enough for the Internet companies," said Pivotal's Susheel Kaushik, senior director of technical product marketing, in an interview with InformationWeek. "When you look at enterprises, they're looking for reliability, failover, availability and standard interfaces. That's what we're providing to enterprises."

The community options will take time to evolve, Kaushik said. He vowed that Pivotal will contribute to efforts such as Spark and Storm (stream processing), but the contribution will stop short of donating assets such as GemFire XD to open source.

Pivotal also announced this week Pivotal Data Dispatch, which is described as an iTunes-like interface for discovering data within Hadoop as well as in other data stores. You can select the data sets of interest and create a big data sandbox.

Data Dispatch offers helpful controls and insights including access, rights and data lineage. It's a management framework rather than a data store, so it's not creating copies of data. Rather it's a Web portal to all available data for big data exploration.

The Hadoop community also is working on the problems of access controls, rights management and data lineage, but there's enough chaos (with different distributors proposing different tools) and immaturity for commercial vendors to exploit. It's not unlike the Entity-Centric Data Catalog introduced by Platfora, though Data Dispatch also catalogs data available outside of Hadoop.

Enterprise Vs. Internet Focus

The easiest way to understand these companies is to look at their customer bases. Where Teradata, SAS and Pivotal are clearly playing to their enterprise roots, Platfora has the clean-slate freedom to tackle those "over-the-horizon" thinkers trying to address more holistic big data opportunities. Both camps are offering commercial tools that fill gaps in current open source offerings.

Attending this week's Big Data Conference in Chicago, it struck me that a number of practitioners and panelists -- ACE Group Insurance, Tenet Healthcare, ThinkBig Analytics -- said their big data teams were quite separate from preexisting BI, data warehousing and data management teams. They work together and collaborate, certainly, but big data initiatives are about finding new insights and pioneering new applications, products and businesses. "If you're going to be a pioneer, you better have some wilderness survival skills," said Scott Rose, VP of services at analytics consulting firm ThinkBig Analytics.

That made me think about this week's release of Hortonworks Data Platform 2.0, with entirely open source components including HBase, ZooKeeper, Pig, Hive, HCatalog, Sqoop, Flume and Mahout. If you're going to be a big data pioneer, maybe you should be prepared to deal with many of these, in some cases, still-primitive tools. They're not exactly stone knives and bearskins, but nor are they as slick, feature-packed and mature as some of the commercial offerings.

If you want to be a big data settler, you might hitch your wagon to a commercial vendor such as Teradata, SAS, Pivotal and even Platfora, with enterprise-focused options promising reliability, failover, availability and standard interfaces. You'll still be ahead of the crowd back in the land of purely structured data, but you'll get some of the creature comforts of civilization.

IT leaders must know the trade-offs they face to get NoSQL's scalability, flexibility and cost savings. Also in the When NoSQL Makes Sense issue of InformationWeek: Oregon's experience building an Obamacare exchange. (Free registration required.)

3 of 3
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Strategies to Conquer the Cloud
Chances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 6, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll