Big Data // Big Data Analytics
02:55 PM
Connect Directly
The Analytics Job and Salary Outlook for 2016
Jan 28, 2016
With data science and big data top-of-mind for all types of organizations, hiring analytics profes ...Read More>>

Splice Machine SQL Database Scales On Hadoop

Splice Machine promises SQL- and ACID-compliant RDBMS for analytics and transaction processing on a super-scalable, low-cost Hadoop platform.

10 Hadoop Hardware Leaders
10 Hadoop Hardware Leaders
(Click image for larger view and slideshow.)

Promising the best of both worlds, startup Splice Machine last week announced the latest stab at putting SQL on Hadoop, but this time it's a fully SQL-compliant and ACID-compliant relational database management system (RDBMS) on Hadoop that's not just for analytics.

"Splice Machine can replace Oracle, Microsoft SQL Server, IBM DB2, or MySQL, where those systems might hit the wall from a performance or cost perspective," said Monte Zweben, CEO of Splice Machine, in a phone interview with InformationWeek.

Hadoop provides the scale-out technology for Splice Machine, so it runs on scalable, commodity clusters. At the same time it's compatible with existing investments in SQL-based business intelligence software, ETL systems, and applications through an ODBC/JDBC driver.

[Want more on Hadoop options? Read Pivotal Subscription Points To Real Value In Big Data.]

Several databases have been ported to run on top of Hadoop, including Pivotal's Greenplum database (through HAWQ) and InfiniDB, but these are specialized databases designed for high-scale querying and analysis. Splice Machine, which marries the open-source Apache Derby Java-based database with Hadoop's HBase NoSQL database, touts RDBMS-speed transaction processing.

"Our unique differentiation is that we're the only [SQL-on-Hadoop option] that can support concurrent reads and writes in a transactional context with ACID compliance," says Zweben.

Splice Machine uses a concurrency control method called "snapshot isolation" in combination with HBase, which has ACID properties over updates in a single table. The Apache Derby SQL planner and optimizer have been extended to take advantage of Hadoop's parallel architecture, according to Zweben. As plans are executed on each node, they're spliced back together -- thus the name of the company.

"We started with two well established open-source stacks, Derby and Hadoop, and that's one of the reasons we can come to market so quickly," says Zweben. Splice Machine was founded in 2012.

With last week's introduction, Splice Machine entered public beta, but the company says it has 15 charter customers in industries including digital marketing, telecom, and high-tech. One of those customers is well known marketing services firm Harte Hanks, which has been testing Splice Machine since last summer.

Harte Hanks is poised to replace Oracle RAC in a campaign-management application that combines IBM Unica, IBM Cognos reporting, Ab Initio data-integration software, and Trillium data-cleansing technologies. All of the above are designed to run on or work with SQL RDBMSs, so moving the app onto Hadoop or

Next Page

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
5/19/2014 | 4:49:47 PM
Cloudera or MapR: Harte Hanks Considers its options
It's interesting to note that Harte Hanks -- the Splice Machine customer interviewed for this article -- hasn't yet settled on Cloudera for the long term. Harte Hanks needs dedicated database instances of Splice Machine for each customer, so it needs separate instances. On Cloudera it has to be separate physical instances, but on MapR it could be virtual instances.

"Cloudera uses more of the open source stack and fewer proprietary pieces than MapR," said Harte Hank's Robert Fuller, explaining his initial choice of Cloudera. "MapR now promises to support all of the open source pieces of Hadoop but at the same time their proprietary piece offer substantial benefits."

MapR has invested heavily in multi-tenancy, for example, Fuller explained. In Cloudera or Hortonworks, you have to tune the cluster to your applications, but you have to do it cluster-wide. "MapR has done a lot of work to make those settings shardable within the cluster, so you can make certain servers run in one configuration and others in a different configuration," Fuller said.

Harte Hanks is only talking to MapR at this point, and it would have to prove that a Splice Machine deployment running on MapR could run all of Harte Hank's software and give it virtual cloud deployment flexibility.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
How to Knock Down Barriers to Effective Risk Management
Risk management today is a hodgepodge of systems, siloed approaches, and poor data collection practices. That isn't how it should be.
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.