Cloudera CEO Mike Olson urges companies to reconsider their data-management approach as the "center of gravity" shifts toward Hadoop.
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)
Cloudera CEO Mike Olson took the stage Tuesday at the Cloudera Forum in San Francisco to extol companies to "unaccept the status quo" of enterprise data warehousing (EDW).
"The place that enterprises will store their data is shifting toward Hadoop," Olson told InformationWeek last week in a preview of the speech. "We're seeing customers not replace, but, rather, rationalize their enterprise data warehouse investments by adding Hadoop alongside."
Olson also announced a new Cloudera Search capability built on Apache Solr, but the larger purpose of the presentation was to question assumptions about enterprise data management. EDWs are "increasingly costly and difficult to maintain," Olson said, because the volume and variety of data now encountered is "totally out of whack" with what relational data warehouses were designed to handle in the 1990s.
EDW costs are upward of $20,000 per terabyte, versus $1,000 to $2,000 per terabyte for Hadoop clusters, including hardware, according to Olson. Thus the time is right, he said, to reconsider where most data is stored, transformed, cleaned, prepared and interactively queried.
"All of those workloads are sucking cycles away from the stuff that the EDW platform does very well," Olson said, citing high-powered analytics and cube-based analyses as the key roles that EDWs will continue to handle.
The announcement of Cloudera Search expands the list of Hadoop platform capabilities. Tapping the open-source Apache Solr search engine, Cloudera said it will support natural language keyword searches and faceted navigation of data stored in the Hadoop Distributed File System (HDFS) and Apache HBase. The tool runs on the Hadoop cluster and will be useful for exploring data and finding subsets of information that might be targeted for large-scale MapReduce processing, Olson said.
"When you have petabytes of data, folders don't work anymore, and we've all learned from Google that, when you need to find some bit of information, you just go search for it," he said. "Anybody can use this and you don't need to define ontologies or taxonomies to set it up."
Cloudera Search has been in private beta for several months, with Monsanto cited as a company using the software to support a high-scale search application. The software will be distributed as part of Cloudera's Hadoop distribution, but management capabilities for search will be an add-on offering that's part of the vendor's commercial Cloudera Manager software.
Cloudera competitor MapR recently announced its own answer to search-on-Hadoop, also based on Solr, but Olson discounted it as "an announcement not supported by shipping code." Cloudera Search is now available for download as part of a public beta test that's expected to last three months.
As for Cloudera's premise of offloading ETL and basic BI workloads and data volumes from more expensive EDWs onto Hadoop, the idea is not shocking or new. The strategy of turning Hadoop into the enterprise data hub has been articulated by outspoken practitioners such as Phil Shelley, CTO at Sears Holdings.
The topic also has been openly debated by vendors, with leading database suppliers such as Teradata and IBM shrugging off Hadoop as just one more arrow in the data-management quiver that enterprises will need to address big-data opportunities.
With the costs of Hadoop what they are and the scale of data growing exponentially, there's little doubt that Hadoop's popularity will grow. Time will tell just how soon and to what extent it will displace EDWs from accustomed roles such as transforming and storing the bulk of historical data and supporting the basics of BI and reporting.
E2 is the only event of its kind, bringing together business and technology leaders across IT, marketing, and other lines of business looking for new ways to evolve their enterprise applications strategy and transform their organizations to achieve business value. Join us June 17-19 for three days of 40+ conference sessions and workshops across eight tracks and discover the latest insights in enterprise social software, big data and analytics, mobility, cloud, SaaS and APIs, UI/UX and more. Register for E2 Conference Boston today and save $200 off Full Event Passes, $100 off Conference, or get a FREE Keynote + Expo Pass!
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
InformationWeek Tech Digest, Nov. 10, 2014Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?