Big data vendors are all trying to give their customers something called "situational awareness" -- delivering systems that provide real-time insight into sales, transactions, and other data. MapR, one of the top three Apache Hadoop distribution companies (Cloudera and Hortonworks are the other two), will get closer to that goal with the release of MapR Streams.
The company describes the technology as a real-time "global event streaming system," which will be delivered as part of its MapR Converged Data Platform in early 2016.
MapR Streams connects and tracks multiple data streams among multiple sources. Developers can use Streams to build scalable high-volume systems that can handle billions of messages among millions of topics spread out over thousands of locations.
"The operating system does a great job with processing, but lags as a data platform," Jack Norris, Chief Marketing Officer at MapR, told InformationWeek in an interview. While analysis is possible once the data is collected, it hasn't been so easy to do while the data is in stream and in use. Norris said that customers want to know what is happening in the present, rather than finding out at sundown what happened during the day.
MapR Streams aims to eliminate that time delay by allowing real-time analysis of data, regardless of source. The strategy behind the solution is to concentrate processing in the layer between the data and the apps, and identifying what type of data is being analyzed (files, tables, documents, or streams) rather than classifying it according to the silo it was drawn from.
This requires a change in architecture rather than a rearrangement of existing apps, Norris said, and that poses a challenge. "Some of it is scale. Some of it is the disparity of sources. Some of it is global synchronization, which requires sophisticated replication.
"Big data is generated one event at a time," Norris said. "The sum total of all that is big data." Without applying analysis to the live stream, the data simply goes into a repository and sits there until analyzed. With MapR Stream, consolidation happens in minutes, not at the end of the day.
That consolidation provides the view of "what is happening." But that view alone can lead to an operational paradigm shift in certain industries, depending on how the technology gets used.
[Find out more about MapR's real-time efforts. Read MapR Drafts JSON to Work With Hadoop.]
The system enables developers to unite analytics, transactions, and stream-processing while reducing data duplication and minimizing cluster sprawl. Cross-site replication allows the construction of global real-time apps that can provide reliable message delivery and order consistency. MapR Streams can interface with other Apache Software Foundation big data projects including Spark Streaming, Apache Storm, Apache Flink, and Apache Apex.
Norris called the new architecture "the biggest change in enterprise computing in decades."
Why? He gave several examples. Take online retailing, where a big data insight can suggest additional, related products to add to a transaction. Do that half a million times over the course of a year and one can realize significant additional revenue.
The technique can be applied to credit card transactions, using risk mitigation to avoid the cost of fraud. It can also be used in the oil and gas industry to monitor pipeline and refinery equipment and identify scheduling opportunities for preventive maintenance without disrupting continuous operations.
**New deadline of Dec. 18, 2015** Be a part of the prestigious InformationWeek Elite 100! Time is running out to submit your company's application by Dec. 18, 2015. Go to our 2016 registration page: InformationWeek's Elite 100 list for 2016.