MapR Streams: Big Data Analysis In Real-Time - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Software Platforms

MapR Streams: Big Data Analysis In Real-Time

MapR is again beefing up its real-time efforts for big data with the release of MapR Streams. Here's how it works.

Apple, Microsoft, IBM: 7 Big Analytics Buys You Need to Know
Apple, Microsoft, IBM: 7 Big Analytics Buys You Need to Know
(Click image for larger view and slideshow.)

Big data vendors are all trying to give their customers something called "situational awareness" -- delivering systems that provide real-time insight into sales, transactions, and other data. MapR, one of the top three Apache Hadoop distribution companies (Cloudera and Hortonworks are the other two), will get closer to that goal with the release of MapR Streams.

The company describes the technology as a real-time "global event streaming system," which will be delivered as part of its MapR Converged Data Platform in early 2016.

MapR Streams connects and tracks multiple data streams among multiple sources. Developers can use Streams to build scalable high-volume systems that can handle billions of messages among millions of topics spread out over thousands of locations.

"The operating system does a great job with processing, but lags as a data platform," Jack Norris, Chief Marketing Officer at MapR, told InformationWeek in an interview. While analysis is possible once the data is collected, it hasn't been so easy to do while the data is in stream and in use. Norris said that customers want to know what is happening in the present, rather than finding out at sundown what happened during the day.

MapR Streams aims to eliminate that time delay by allowing real-time analysis of data, regardless of source. The strategy behind the solution is to concentrate processing in the layer between the data and the apps, and identifying what type of data is being analyzed (files, tables, documents, or streams) rather than classifying it according to the silo it was drawn from.

(Image: NorthernStock/iStockphoto)

(Image: NorthernStock/iStockphoto)

This requires a change in architecture rather than a rearrangement of existing apps, Norris said, and that poses a challenge. "Some of it is scale. Some of it is the disparity of sources. Some of it is global synchronization, which requires sophisticated replication.

"Big data is generated one event at a time," Norris said. "The sum total of all that is big data." Without applying analysis to the live stream, the data simply goes into a repository and sits there until analyzed. With MapR Stream, consolidation happens in minutes, not at the end of the day.

That consolidation provides the view of "what is happening." But that view alone can lead to an operational paradigm shift in certain industries, depending on how the technology gets used.

[Find out more about MapR's real-time efforts. Read MapR Drafts JSON to Work With Hadoop.]

The system enables developers to unite analytics, transactions, and stream-processing while reducing data duplication and minimizing cluster sprawl. Cross-site replication allows the construction of global real-time apps that can provide reliable message delivery and order consistency. MapR Streams can interface with other Apache Software Foundation big data projects including Spark Streaming, Apache Storm, Apache Flink, and Apache Apex. 

Norris called the new architecture "the biggest change in enterprise computing in decades."

Why? He gave several examples. Take online retailing, where a big data insight can suggest additional, related products to add to a transaction. Do that half a million times over the course of a year and one can realize significant additional revenue.

The technique can be applied to credit card transactions, using risk mitigation to avoid the cost of fraud. It can also be used in the oil and gas industry to monitor pipeline and refinery equipment and identify scheduling opportunities for preventive maintenance without disrupting continuous operations.

**New deadline of Dec. 18, 2015** Be a part of the prestigious InformationWeek Elite 100! Time is running out to submit your company's application by Dec. 18, 2015. Go to our 2016 registration page: InformationWeek's Elite 100 list for 2016.

William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
12/9/2015 | 7:07:07 PM
Using lots of data in near real time the goal
The goal is to get to data use in real time and that goal is still a long ways off.  But MapR, Cloudera and Hortonworks and the Spark project are all pushing in that direction. Compared to how hard it used to be to make use of a large amount of data quickly, we've come a long way on what will prove an exhausting jouring.
Gary_EL
50%
50%
Gary_EL,
User Rank: Ninja
12/8/2015 | 4:08:24 PM
Data, data, everywhere
Old news is no news at all. If all the titanic amounts of information being harvested can't be immediately analyzed and exploited, it is of significantly less value to the organizations that are spending big money to obtain it. More efforts such as described will be needed, as the IoT advances and collects more data still.
Commentary
Enterprise Guide to Edge Computing
Cathleen Gagne, Managing Editor, InformationWeek,  10/15/2019
News
Rethinking IT: Tech Investments that Drive Business Growth
Jessica Davis, Senior Editor, Enterprise Apps,  10/3/2019
Slideshows
IT Careers: 12 Job Skills in Demand for 2020
Cynthia Harvey, Freelance Journalist, InformationWeek,  10/1/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll