What's the status of the big data revolution? Fresh clues emerged this week with Hadoop vendor Cloudera scoring a $160 million round of venture capital funding, big data analytics company Platfora getting a $38 million capital infusion, and Allied Market Research issuing an estimate that the $2 billion Hadoop ecosystem (as measured in 2013) will quickly grow to $50 billion by 2020.
Citing that heady $50 billion stat, Rob Bearden, CEO of Cloudera-rival Hortonworks, said that he expects see "60%, 70%, 80%" of enterprise data moving into Hadoop over the coming years. Speaking at this week's GigaOM Structure Data big data event in New York, Bearden said Hadoop changes the economics of managing data, giving companies a sought-after "single platform that manages all data types and structures."
Structure keynoter Paul Maritz, CEO of EMC-spinoff Pivotal, said his company is focused on making Hadoop enterprise-ready so "mere mortals can do what the Internet giants have done with lots of data." Businesses are "starting to wake up to the opportunity," he said, citing General Electric as a case in point. GE CEO Jeffrey Immelt has changed the direction of that industry giant to seize the opportunity in the Internet of Things, which inspired its "industrial Internet" strategy, marked by connected turbines, locomotives, aircraft engines, and more. The need for big data tooling was one motivation behind GE's $105-million, 2013 investment in Pivotal.
[Want more on Pivotal's latest moves? Read Pivotal Brings In-Memory Analysis To Hadoop.]
Cutting-edge giants like GE aren't the only ones investing in big data. "We're starting to see companies reconceive themselves as data companies," Maritz observed. "When all of the consumers in the world got connected to the Internet, it enabled a radical change. As billions of devices get connected, that, too, will enable radical change, so we have to embrace it."
The big data expenditures won't go just to Hadoop providers. Exhibitors at the Structure event represented a cross-section of technologies:
- Alpine Data Labs announced support for the open source Spark technology for in-memory analysis on top of Hadoop. Spark developer and support provider Databricks has certified Alpine's implementation of the technology for machine learning and analytics.
- HP Vertica is in partnership with multiple Hadoop vendors (most recently MapR), but with its recent Vertica 7 release it introduced Flex Zone, which sounds like a lightweight alternative to Hadoop. Flex Zone is built on commodity hardware. Its nodes can store structured or semi-structured data. It supports schema-on-read analysis, meaning you just load data without having to create a schema in advance or use ETL to load. FlexZone is deployed and managed with the same tools used for Vertica, and it's queried with SQL (or in-database R or Java-based algorithms). Flex Zone does not support unstructured data (like images or audio files) or MapReduce processing as Hadoop does. But you won't have to learn Pig or MapReduce, and it's said to be about in line with Hadoop storage costs.
- MetaScale, the big data consulting and services firm, spun out of Sears, highlighted a new managed-services program through which it can take over the management and administration of Hadoop clusters and other big data infrastructure that's already in use. It does so using remote-monitoring capabilities. For companies that have yet to deploy big data infrastructure, MetaScale offers Hadoop and NoSQL appliances that are prewired for its remote-management services. The idea is here is to get around the big data talent shortage and speed deployments by tapping MetaScale's experience in big data deployments and its economies of scale in managing infrastructure.
- New Relic, a Web- and mobile-application monitoring company, this week announced new Insight analytics capabilities within its platform. The idea is to go beyond monitoring app performance and to start collecting and analyzing application data, such as customer names, ages, subscription levels, product selections, and other attributes that might be used for up-selling, cross-selling, and customer segmentation. Think Splunk-meets-application-monitoring, but in this case the audience is developers who can exploit the tools to build more intelligence into their Web and mobile apps.
- Paxata won a Best Analytics Startup award at Strata for its Adaptive Data Preparation platform, which runs on Hadoop or in the cloud. Geared to business analysts, the platform supports merging, cleaning, enriching, and otherwise shaping raw data sets into information that's ready for business intelligence and analytics. The data-management tools bridge the gap between information-management professionals and data scientists -- the people who do all the heavy-duty coding and data-management work -- and the business users who demand novel combinations of data and new reports. The analysts in between have lacked tools for working efficiently with data, according to Paxata.
You might think of some or all of these vendors as disruptors, but Shaun Connolly, Hortonworks VP of corporate strategy, says data is what's disrupting the datacenter, not Hadoop, NoSQL databases, or any other technology or group of vendors. It's the masses of data generated by new devices, applications, digital services, sensors, interaction modes, and more. New technologies and platforms weren't just invented by new vendors who wanted a piece of old IT budgets. They were invented to solve new problems that weren't well addressed by the old tools.
Solid state alone can't solve your volume and performance problem. Think scale-out, virtualization, and cloud. Find out more about the 2014 State of Enterprise Storage Survey results in the new issue of InformationWeek Tech Digest.