How is your company going to get to the promised land of big data insight? Technology vendors, as always, stand ready to be your guide, but who can really take you where you want to go? On the one hand there are the incumbents, companies like Teradata and SAS that have been solving your large-scale data problems and most advanced analytics challenges for decades. Then there are the big data startups and visionaries, including the likes of Pivotal and Platfora, that are "all in on Hadoop" and insist that big data opportunities and new platforms demand new tools. Then there's the wild west of purely open source options.
So which option is the best route to tapping into new data sources, like clickstreams, mobile, social and sensor data, to form a more complete view of customers, products or supply chains? This week brought a fresh round of announcements from the vendors, each one offering its own view of the best path to big data opportunity. Here's a rundown on the themes and promises behind all the similar-sounding big data buzzwords.
Teradata Wants To Do It All
Teradata offers a Hadoop software distribution (from partner Hortonworks) and a Hadoop appliance as part of a broad, all-encompassing Unified Data Architecture, but the message from this stalwart data warehousing leader of late is that the Hadoop platform and ecosystem are too green, too disparate, too hard and too unfamiliar for most enterprises to handle anything but high-scale, low-cost storage.
Where multistructured data like clickstreams, social, mobile and sensor data are concerned, Hadoop can store it at low cost, alright, Teradata acknowledges, but it touts its Map/Reduce-capable Teradata Aster Discovery Platform as a better option than Hadoop for exploratory big data analytics. Teradata extended this role earlier this month by adding SQL-GR, a graph analysis engine, to Teradata Aster. Teradata says SQL-GR provides a more scalable alternative to NoSQL graph databases such as Neo4j and others.
[ Want more on big data platform and application options? Read Hadoop 2.0 Goes GA: New Workloads Await. ]
This week Teradata announced another response to demand stirred up by the NoSQL camp by adding Java Script Object Notation (JSON) data-handling capabilities to its flagship Teradata 14 database. JSON is frequently the data format used for data from servers, sensors and other high-volume, multistructured Web, mobile and remote data sources. But Teradata isn't a transactional database and can't offer an alternative to MongoDB, the leading NoSQL database, which is typically used for rapid development of Web and mobile transactional applications.
Teradata's JSON use case (as with all other data) is high-scale analytics, and the claim is that Teradata warehouses can now be "the analytic hub to monetize the Internet of Things." Teradata would certainly love to gobble up the high volumes of data generated by sensors and machines, but whether customers will want to pay Teradata prices to analyze such high-scale, variable data is another question.
Another place where one might question the economics of high scale is the cloud, but Teradata can't sit back while companies like Amazon, with its RedShift service, and 1010data and Kognitio, with their analytic mart capabilities, put data warehousing in the cloud.
The Teradata Cloud introduced this week gives companies a rapidly deployable, elastic option from Teradata. Capacity in the Teradata Cloud will be total-cost-of-ownership-neutral with on-premises deployments over the first three years, according to the company. You're freed from all the infrastructure, deployment and administrative burdens, though the suggestion is that if you have a fixed, long-term capacity need, it's cheaper to run it yourself. For now the Teradata Cloud is available in the U.S. only, but next year it will extend to other countries and it will add a Teradata Aster cloud option.
If you didn't pick up on the theme with Teradata, it's that it wants to enable you to do it all on its platform. Hadoop is part of that world, but the company has added options for Map/Reduce, graph and JSON so you can do everything on its databases. Will that really crack the big data opportunity? More on that in our Platfora analysis.
Platfora Says New Tools Are Needed
In Platfora's book, the capabilities of conventional business intelligence and SQL are fine for the old questions, but big data pioneers are building on Hadoop and want to go "far beyond that," says Ben Werther, the company's CEO and a former executive of Greenplum (the database management system now owned by ECM's Pivotal unit).
"In the old world you'd look at sales by store and so on, but in the new world you want to look at things like clickstream behavior and how it translates and relates to physical store activity," Werther explains. "Big data practitioners want to do A:B testing across their products to optimize for downstream revenue, not just clicks. They are connecting the dots across the old traditional data sources with this new world of digital clicks, ads and social big data."