The rise of interest in Apache Spark has demonstrated just how important streaming data has become in the big data ecosystem. Real-time data and the technologies that support it were perhaps the biggest stars of last month's Strata + Hadoop World conference in San Jose.
So it's probably no coincidence that Apache Apex has been elevated to a Top-Level Project (TLP) by the Apache Software Foundation this week, too. The streaming and batch-processing engine for Hadoop is used by the GE Predix IoT cloud platform for industrial data and analytics, and by Capital One for real-time decisions and fraud detection.
DataTorrent contributed the technology to the Apache Software Foundation as an incubator project in August 2015 after originally creating it in 2012.
"Apache Apex is an example of the latest generation of advanced stream processing software that adds significant technology and capabilities over previous options," said Ted Dunning, vice president of the Apache Incubator, Apache Apex Incubator Mentor, and Chief Application Architect at MapR Technologies, in a prepared statement.
Apache Apex enables streaming analytics on Apache Hadoop. It was created to leverage the infrastructure provided by Hadoop components YARN and HDFS (Hadoop Distributed File System). It's a large-scale, high-throughput, low-latency, fault-tolerant, unified big data stream and batch-processing platform for the Hadoop ecosystem, the Apache Software Foundation said in a statement announcing the technology's promotion to a TLP.
Streaming technology for big data and analytics continues to grow in importance as organizations and developers bake real-time analytics into their processes and apps. Forrester Research released a Wave report on big data streaming analytics in March that looks at this trend and at some of the vendors who offer this technology.
"Forrester defines perishable insights as urgent business situations (risks and opportunities) that firms can only detect and act on at a moment's notice," wrote authors and Forrester analysts Mike Gualtieri and Rowan Curran in the report. "Streaming analytics solutions can help firms detect such insights in high-velocity streams of data and act on them in real-time. Application development and delivery professionals should not dismiss streaming analytics as a form of 'traditional analytics' used for postmortem analysis. Far from it -- streaming analytics analyzes data right now, when it can be analyzed and put to good use to make applications of all kinds contextual and smarter."
Forrester put Apache Apex's creator, DataTorrent, into the Leaders section of its Wave report on big data streaming analytics, along with some pretty big names in tech -- IBM, Software AG, SAP, TIBCO Software, Oracle, and SQLstream.
"DataTorrent is the streaming startup to beat in Silicon Valley," Forrester's analysts wrote in their report. "The Yahoo-trained founders built a streaming platform to handle the world's biggest, fastest data."
Forrester notes that DataTorrent is working to deliver on other enterprise needs, too, such as a visual development tool and a library of more than 400 operators.
"The core of DataTorrent is now open sourced as Apache Apex, but making its voice heard over the chorus of other open source streaming options will be a significant challenge," the authors concluded.
In announcing the new TLP status, the Apache Software Foundation said that Apex can streamline the development of Hadoop applications by letting developers write or re-use generic Java code. That helps minimize the specialized expertise required to write apps and can therefore reduce time to market.
It includes connectors to integrate with external systems that include message buses, databases, file systems, and social media feeds such as Apache Cassandra, Apache HBase, JDBC, and Apache Kafka.