Big Data // Software Platforms
Commentary
3/20/2014
11:34 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Big Data Reaches Inflection Point

Enterprises see the light on big data opportunities. It's only a matter of time before mainstream data-management environments evolve.

What's the status of the big data revolution? Fresh clues emerged this week with Hadoop vendor Cloudera scoring a $160 million round of venture capital funding, big data analytics company Platfora getting a $38 million capital infusion, and Allied Market Research issuing an estimate that the $2 billion Hadoop ecosystem (as measured in 2013) will quickly grow to $50 billion by 2020.

Citing that heady $50 billion stat, Rob Bearden, CEO of Cloudera-rival Hortonworks, said that he expects see "60%, 70%, 80%" of enterprise data moving into Hadoop over the coming years. Speaking at this week's GigaOM Structure Data big data event in New York, Bearden said Hadoop changes the economics of managing data, giving companies a sought-after "single platform that manages all data types and structures."

Structure keynoter Paul Maritz, CEO of EMC-spinoff Pivotal, said his company is focused on making Hadoop enterprise-ready so "mere mortals can do what the Internet giants have done with lots of data." Businesses are "starting to wake up to the opportunity," he said, citing General Electric as a case in point. GE CEO Jeffrey Immelt has changed the direction of that industry giant to seize the opportunity in the Internet of Things, which inspired its "industrial Internet" strategy, marked by connected turbines, locomotives, aircraft engines, and more. The need for big data tooling was one motivation behind GE's $105-million, 2013 investment in Pivotal.

[Want more on Pivotal's latest moves? Read Pivotal Brings In-Memory Analysis To Hadoop.]

Cutting-edge giants like GE aren't the only ones investing in big data. "We're starting to see companies reconceive themselves as data companies," Maritz observed. "When all of the consumers in the world got connected to the Internet, it enabled a radical change. As billions of devices get connected, that, too, will enable radical change, so we have to embrace it."

Hortonworks CEO Rob Bearden.
Hortonworks CEO Rob Bearden.

The big data expenditures won't go just to Hadoop providers. Exhibitors at the Structure event represented a cross-section of technologies:

  • Alpine Data Labs announced support for the open source Spark technology for in-memory analysis on top of Hadoop. Spark developer and support provider Databricks has certified Alpine's implementation of the technology for machine learning and analytics.
  •  HP Vertica is in partnership with multiple Hadoop vendors (most recently MapR), but with its recent Vertica 7 release it introduced Flex Zone, which sounds like a lightweight alternative to Hadoop. Flex Zone is built on commodity hardware. Its nodes can store structured or semi-structured data. It supports schema-on-read analysis, meaning you just load data without having to create a schema in advance or use ETL to load. FlexZone is deployed and managed with the same tools used for Vertica, and it's queried with SQL (or in-database R or Java-based algorithms). Flex Zone does not support unstructured data (like images or audio files) or MapReduce processing as Hadoop does. But you won't have to learn Pig or MapReduce, and it's said to be about in line with Hadoop storage costs.
     
  • MetaScale, the big data consulting and services firm, spun out of Sears, highlighted a new managed-services program through which it can take over the management and administration of Hadoop clusters and other big data infrastructure that's already in use. It does so using remote-monitoring capabilities. For companies that have yet to deploy big data infrastructure, MetaScale offers Hadoop and NoSQL appliances that are prewired for its remote-management services. The idea is here is to get around the big data talent shortage and speed deployments by tapping MetaScale's experience in big data deployments and its economies of scale in managing infrastructure.
     
  • New Relic, a Web- and mobile-application monitoring company, this week announced new Insight analytics capabilities within its platform. The idea is to go beyond monitoring app performance and to start collecting and analyzing application data, such as customer names, ages, subscription levels, product selections, and other attributes that might be used for up-selling, cross-selling, and customer segmentation. Think Splunk-meets-application-monitoring, but in this case the audience is developers who can exploit the tools to build more intelligence into their Web and mobile apps.
     
  • Paxata won a Best Analytics Startup award at Strata for its Adaptive Data Preparation platform, which runs on Hadoop or in the cloud. Geared to business analysts, the platform supports merging, cleaning, enriching, and otherwise shaping raw data sets into information that's ready for business intelligence and analytics. The data-management tools bridge the gap between information-management professionals and data scientists -- the people who do all the heavy-duty coding and data-management work -- and the business users who demand novel combinations of data and new reports. The analysts in between have lacked tools for working efficiently with data, according to Paxata.

You might think of some or all of these vendors as disruptors, but Shaun Connolly, Hortonworks VP of corporate strategy, says data is what's disrupting the datacenter, not Hadoop, NoSQL databases, or any other technology or group of vendors. It's the masses of data generated by new devices, applications, digital services, sensors, interaction modes, and more. New technologies and platforms weren't just invented by new vendors who wanted a piece of old IT budgets. They were invented to solve new problems that weren't well addressed by the old tools.

Solid state alone can't solve your volume and performance problem. Think scale-out, virtualization, and cloud. Find out more about the 2014 State of Enterprise Storage Survey results in the new issue of InformationWeek Tech Digest.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
3/24/2014 | 12:21:57 PM
Re: Revolution or Evolution
My latest example of a data lake user -- coming in a story next week -- is a major drug manufacturer that is running Hortonworks on Amazon's cloud. The deployment lets them study all data related to the production of drugs -- 10+ years worth of information including plant-floor temperature and pressure sensor readings -- in order to spot yeild variations by batch and determine how to optimize production yeild. It's the sort of lighthouse implementation that shows others the way and makes adoption much more likely in the five-year time frame than the 15-year time frame. Stay tuned for the full story next week.
weckerson
50%
50%
weckerson,
User Rank: Apprentice
3/21/2014 | 9:16:25 AM
Revolution or Evolution
The estimates are high but we'll get there eventually -- 15-20 years? The data lake is an aspirational architecture that is for bleeding-edge companies only right now. We'll see slow evolution for some time to come. 
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
3/21/2014 | 8:26:20 AM
Calculating the rate of infection
I'm told that Cloudera and Hortonworks are each logging deals at a rate of 50 to 70 per quarter. Throw in all the other Hadoop distributors and call it 500 to 700 per year. That's at current rates, which we can assume will accelerate. In 2013, for instance, sales teams at HP, Microsoft, SAP, and Teradata added Hortonworks' distribution to the portfolio, and we can confidently guess that having more feet on the street will increase sales. Cloudera notes that deal sizes and deployments are growting. And let's assume also that these suppliers are pursuing the biggest companies possible.

With these stats, it won't take long to see big adoption among the Global 2000. From there the question is how much serious, production-grade work are they doing on the platform and to what degree are they moving existing workloads onto Hadoop? In 2014 we're still in the early days. By 2020, I think the data-management arena is going to be very different.  
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
3/20/2014 | 5:35:06 PM
Re: How much data moves to Hadoop?
Those were figures for 2017 into 2020. That's an eternity in a fast-moving market.
Laurianne
50%
50%
Laurianne,
User Rank: Author
3/20/2014 | 5:23:16 PM
Re: How much data moves to Hadoop?
Hadoop as parking garage. Is that efficient?
tunvall01
50%
50%
tunvall01,
User Rank: Apprentice
3/20/2014 | 3:22:26 PM
Re: How much data moves to Hadoop?
Rob Bearden's numbers are extremely high. Hadoop techonology and skills are way to immature to succesfully be adopted in the majority of operational and production systems. 
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
3/20/2014 | 2:31:40 PM
Hadoop data vs. Know SQL data
Hadoop is something like a big, undifferentiated resevoir. It collects data from everywhere. The water coming out of your faucet, on the other hand, is charcoal-filtered and chlorinated. Yes, there's a lot of water in the resevoir, but remember where the stuff good to drink is coming from. Sorry, we're having a drought here in Calif.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
3/20/2014 | 1:18:14 PM
Re: How much data moves to Hadoop?
The $50 billion figure is the one that sounds high. The Data Hub/Data Lake concept foresees Hadoop as a place where you store many types of data, including high-scale machine data (log files, clickstreams) that many companies haven't been collecting or keeping until recently. That you have copies of such data in a hub does NOT mean that you are necessarily replacing transactional systems or even data warehouses. It's a cheaper platform on which to gain new insights and perhaps replace some workloads. If the Hub has high-scale data and lots of historcal data long since deleted from operatoinal systems, it's not at all unrealistic to see 60%, 70%, 80% of the total data footprint in an organization being stored on Hadoop.

 
Laurianne
50%
50%
Laurianne,
User Rank: Author
3/20/2014 | 12:52:25 PM
How much data moves to Hadoop?
"Rob Bearden, CEO of Cloudera-rival Hortonworks, said that he expects see "60%, 70%, 80%" of enterprise data moving into Hadoop over the coming years." Does this sound high to you, Doug? Sounds pretty high to me given state of tools and Hadoop talent.
In A Fever For Big Data
In A Fever For Big Data
Healthcare orgs are relentlessly accumulating data, and a growing array of tools are becoming available to manage it.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.