With Hadoop, Big Data Analytics Challenges Old-School Business Intelligence
Datameer and Karmasphere say their Hadoop-based platforms are what's needed for the next era of data analysis. Are you buying it?
What good is a "big data" analytics system that's capped at 1 terabyte?
Hadoop is actually misperceived as a solution that's exclusively about big data, according to Groschupf, who contends that it's suitable for small deployments where variable-data analysis is required. Datameer says possible workgroup uses include the same sorts of analyses Hadoop users might contemplate--like finding correlations among clickstreams, online signups, and e-mail campaigns--but with a week's worth of data instead of a year.
Datameer Personal is $300 per year, limited to 100 gigabytes per year, and creates a mini Hadoop environment on a PC, giving power users a development and design environment to do small-scale social media analytics.
Karmasphere, too, provides a reporting, analysis and data-visualization platform for Hadoop, but there a few crucial differences in its approach. First, instead of using a spreadsheet-style interface, Karmasphere offers graphical user interface that powers a collaborative workflow that works with Hive, the data warehousing component built on top of Hadoop.
Karmasphere CEO Gail Ennis says Hive is standards based so people can move their reports to other Hadoop distributions, and it's more scalable than Datameer's spreadsheet approach. Datameer's Groschupf counters that its spreadsheet is just an analysis design interface so it doesn't have to scale. He also says that Hive (a tool also used by established BI vendors including Microstrategy and Tableau) lacks support and currently offers less than 30% of the analyses supported in SQL.
"Hive is a crutch compared to an EMC Greenplum, HP Vertica, or Teradata system," Groschupf says. "Those who try to make it a standard data warehouse will fail."
Ennis concedes that Hive has its flaws, but she says "the constraints are going to be addressed very soon because Hive adoption and community development work is moving quickly."
Where Datameer 2.0 introduced workgroup and personal editions, Karmasphere is moving the other way up from desktop software to a server-based product. Thus, Karmasphere 2.0 delivers new capabilities including a collaborative workspace that runs in browsers, a shared asset repository where users can access and version control analyses, and an administrative console for managing user roles, permissions, and security. Also new in version 2.0 is the ability to import SAS and SPSS models and run them on Hadoop.
Karmasphere's pricing is based on the number of nodes in the cluster and the number of named users. Small, five-node/five-user systems start at around $10,000 and average deployments for 30- to 40-node cluster with 10 to 20 users are $40,000. Truly large-scale deployments with hundreds of nodes cost $250,000 to $300,000.
For companies building on Hadoop that aren't invested in so-called old-school BI or relational data warehousing, Datameer and Karmasphere should clearly on the short list. If you're a SQL shop that's heavily invested in more conventional BI, it can't hurt to explore your Hadoop-integration options. Connectors to the Hadoop Distributed File System (HDFS) are commonplace. Less common are connectors to Hive, but keep your eye on growing maturity here.
There are also emerging HCatalog capabilities within Apache Hadoop software that have made it possible for data warehousing vendors including EMC and Teradata's Aster Data unit to tap Hadoop data as if they're indices in any conventional relational database.
There's always an imperative to leverage existing investments first until proven inadequate, so this might delay a quick embrace of Datameer and Karmasphere by BI and analytics pros deeply invested in their existing tools. Veterans also might not be very impressed by the state of evolution of these two very new products.
Can conventional BI tools match an analytic platform built for Hadoop? Groshupf says those just moving boiled-down result sets out of Hadoop and into conventional tools are simply perpetuating islands-of-transactions analysis. And tapping into Hive isn't much better because "you're losing the opportunity [for holistic analysis] because you're creating a static schema that only deals with structured information," he concludes.
The idea of holistic analysis is what the enterprise data warehouse was always about. For many, the enterprise data warehouse remains an elusive dream. Even for those who think they've achieved it, it has always been hard and expensive. We have yet to find out if "next-generation analytics" on top of Hadoop will fulfil the promise of doing so at a lower cost and across a wider variety and larger scale of data.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.