Startup Profile: DataTorrent Taps Hadoop For Streaming Analytics

DataTorrent's software, which runs on top of Hadoop, ingests tens of millions of events per second to analyze data in real-time and let businesses respond faster.

DataTorrent aims to help process real-time data streams so that businesses can react and respond quickly to information from sources such as credit card transactions, industrial sensors, mobile devices, and other systems.

There are many applications for real-time analytics, such as fraud detection or operations monitoring.

DataTorrent, which says its software can ingest tens of millions of events per second, is initially targeting financial services, telecommunications, manufacturing, and online advertisers.

DataTorrent's RTS application runs on top of Hadoop 2.0. The company has 450 Java-based operators to act on numerous types of input, including SQL, XML, sensor data, flat files, and log files. The operators process these events in memory, and RTS sends the output to business process engines, traditional databases, and visualization tools.

[Join us for the workshop Transforming Data Into Information and Knowledge at Interop Las Vegas, from April 27 to May 1. Register now!]

“We designed it to be native to Hadoop, so customers can use current Hadoop clusters to run MapReduce and DataTorrent, co-tenant on the same cluster,” said co-founder and CEO Phu Hoang. “Or you can run our software in the cloud on top of Amazon Web Services.”

DataTorrent RTS also uses Yet Another Resource Scheduler (YARN) for multi-tenancy and performance, so that new instances of RTS can be scaled out to run a job in parallel.

Hoang claims RTS is highly fault-tolerant, designed to check data accuracy without stalling performance. “If there were a system fault, we restart a function and read from Hadoop its last known good state and process from that state,” Hoang said.

While DataTorrent focuses on real-time streaming, the company says it is also looking at ways to enrich its analysis with queries of data at rest.

Pricing for DataTorrent RTS is based on usage, and starts at $2,000 for an annual license per Hadoop node.

“You might have a Hadoop cluster of 50 nodes, but DataTorrent may only run on 3 nodes. We charge based on what your application is using in terms of resources,” said Hoang. “As usage grows and you add more resources, you’d purchase more licenses.”

Product: DataTorrent RTS

Principals: Phu Hoang, co-founder and CEO; Amol Kekre, co-founder and CTO

DNA: Hoang was engineer number 6 at Yahoo. Kekre also has Yahoo roots, including architect and senior engineering manager for Yahoo Finance, and the Director of Engineering for Hadoop.

Founded: May 2012

Funding: $8.75 million to date

Investors: August Capital, Morado Venture Partners, AME Cloud Ventures

Headquarters: Santa Clara, Calif.

Early Customers: Undisclosed

Competition: Apache Storm, Amazon Kinesis, IBM InfoSphere Streams

Pricing: $2,000 annual license per Hadoop node