Splice Machine SQL Database Scales On Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
02:55 PM
Connect Directly

Splice Machine SQL Database Scales On Hadoop

Splice Machine promises SQL- and ACID-compliant RDBMS for analytics and transaction processing on a super-scalable, low-cost Hadoop platform.

a NoSQL database was out of the question. At the same time, running Oracle RAC for a series of 10-terabyte to 20-terabyte instances for each Harte Hanks customer was getting to be expensive.

"We have a lot of investment in things that run on SQL, including Cognos, Unica, ETL work, data cleansing, customer roll-ups and models, and staff that are doing analytics with SAS and SPSS," said Robert Fuller, managing director of product innovation at Harte Hanks, in a phone interview with InformationWeek. "With Splice Machine we can still work with all that, but we're getting the benefits of Hadoop scaling and performance as well as lower-cost hardware and lower-cost software."

By way of comparison, Fuller said adding six nodes to a Hadoop cluster requires $25,000 worth of hardware, whereas adding equivalent capacity with Oracle RAC and a separate storage area network would cost more than $100,000 just for the hardware. Add the software licenses, and "you're not doubling or tripling the cost, you're ten times the cost."

Splice Machine last summer demonstrated in its own labs that it could run the Harte Hanks applications and beat Oracle RAC performance. By the end of last year, Harte Hanks built out a Cloudera cluster and proved that it could replicate that performance using customer data in its own datacenters.

"One of the common campaign-performance queries that we've tested takes about 183 seconds in our production Oracle RAC deployment, and it's taking less than 20 seconds on Splice Machine on a nine-node Cloudera cluster," says Fuller.

The next step for Harte Hanks is to build out replication and high-availability features and take Splice Machine into production. Fuller has not had to hire new staff to learn how to deploy and use Hadoop thus far, but that may change, he says, when Harte Hanks starts taking advantage of MapReduce processing, as well as SQL OLTP and analysis on top of Hadoop.

The next step for Splice Machine hinges in part on the pending 1.0 release of HBase, says Zweben, noting that this foundation of the Hadoop ecosystem is still at the 0.95 release stage. Splice Machine 1.0 will be generally available sometime this year, he vows, but he notes that the Splice Machine public beta release now available for download is suitable for production deployment.

"HBase powers RocketFuel, a company that handles on the order of 15 petabytes of advertising optimization data a day," says Zweben, who is a member of RocketFuel's board of directors. "Our beta system is ready to be put into operation today."

Splice Machine's apparent success in doing it all on Hadoop makes one wonder if the commercial database incumbents can and will follow suit.

Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators. Read our InformationWeek Elite 100 issue today.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
2 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
5/19/2014 | 4:49:47 PM
Cloudera or MapR: Harte Hanks Considers its options
It's interesting to note that Harte Hanks -- the Splice Machine customer interviewed for this article -- hasn't yet settled on Cloudera for the long term. Harte Hanks needs dedicated database instances of Splice Machine for each customer, so it needs separate instances. On Cloudera it has to be separate physical instances, but on MapR it could be virtual instances.

"Cloudera uses more of the open source stack and fewer proprietary pieces than MapR," said Harte Hank's Robert Fuller, explaining his initial choice of Cloudera. "MapR now promises to support all of the open source pieces of Hadoop but at the same time their proprietary piece offer substantial benefits."

MapR has invested heavily in multi-tenancy, for example, Fuller explained. In Cloudera or Hortonworks, you have to tune the cluster to your applications, but you have to do it cluster-wide. "MapR has done a lot of work to make those settings shardable within the cluster, so you can make certain servers run in one configuration and others in a different configuration," Fuller said.

Harte Hanks is only talking to MapR at this point, and it would have to prove that a Splice Machine deployment running on MapR could run all of Harte Hank's software and give it virtual cloud deployment flexibility.
Think Like a Chief Innovation Officer and Get Work Done
Joao-Pierre S. Ruth, Senior Writer,  10/13/2020
10 Trends Accelerating Edge Computing
Cynthia Harvey, Freelance Journalist, InformationWeek,  10/8/2020
Northwestern Mutual CIO: Riding Out the Pandemic
Jessica Davis, Senior Editor, Enterprise Apps,  10/7/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
[Special Report] Edge Computing: An IT Platform for the New Enterprise
Edge computing is poised to make a major splash within the next generation of corporate IT architectures. Here's what you need to know!
Flash Poll