HANA, Hadoop Help SAP Connect IoT And Big Data - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud // Infrastructure as a Service
Commentary
5/20/2015
12:29 PM
50%
50%

HANA, Hadoop Help SAP Connect IoT And Big Data

Engineers at Sapphire Now discussed the potential and limitations for SAP S/4HANA as a platform for managing an enterprise Internet of Things and the big data that strategy requires.

IoT World: Separating Smart And Dumb Things
IoT World: Separating Smart And Dumb Things
(Click image for larger view and slideshow.)

SAP has been very clear that their direction for the future lies along the path of its HANA in-memory technology. At the Sapphire Now 2015 conference this month, SAP execs were also very clear that they want SAP S/4HANA to be a major player in the enterprise Internet of Things. And that's where things get very complicated.

Pretty much everyone agrees that the Internet of Things (IoT) will send a tidal wave of data into the enterprise. Billions of sensors and controllers, all wanting to talk to some other machine for analysis and instruction, create trillions of transactions and data events. How can an organization reconcile built-for-speed in-memory computing with huge data quantities? In a series of conversations at Sapphire, SAP engineers and managers shared their thinking about how to do that.

First, SAP engineers candidly admit that solving the IoT (and related Big Data) issues remains a work in progress. There are products in place -- and customers using those products -- but engineers recognize they need to offer more. The systems in place, though, start with decisions made close to the sensors and edge systems.

One example of SAP's IoT is for controlling intelligent vehicles.

(Image: SAP)

One example of SAP's IoT is for controlling intelligent vehicles.

(Image: SAP)

SAP says that many control decisions don't need to be made by systems at the enterprise core; they can be handled by concentration and control systems at the edge. SQL Anywhere is a lightweight database manager that can be placed on a single-board computer. This database will gather the data from an array of sensors and make moment-to-moment control changes to those systems.

Both the change data (data on error conditions or major changes) and full sensor data is uploaded to the enterprise database using Mobilize -- a product that will queue the data and upload it in bursts when system resources and connectivity are available. At this point, the Internet of Things starts looking very much like a time-critical big data problem. If you decide to do processing in the core rather than at the edge, you face the obstacle of having to get decisions back to the edge in near real time. 

Once the sensor and controller data is uploaded, data-tiering goes into effect. In the conversations I had with engineers they spoke of the critical importance of deciding whether data is "hot," "warm," "cold," or something between one of these increments. (Although, to be honest, why an organization would rely on "tepid" data is beyond me.) Some data can be categorized based simply on its origin or content. Other data will be sorted based on analysis that occurs in the enterprise core. This analysis, the engineers said, can happen most rapidly in a HANA database.

[ Shocking growth or overly optimistic prediction? IoT Market Will Grow 19 Percent in 2015, IDC Predicts. ]

A HANA database that references data sitting in other databases (which can include Hadoop, MongoDB, Oracle, or just about any other commercial database) can use SAP's Smart Data Access -- a data virtualization feature that lets the HANA control structures see data in "virtual data tables" that span multiple databases. Smart Data Access has been around since SAP HANA SP 6, and its limitations have been sorted out reasonably well. The situations in which Smart Data Access isn't sufficient center on the minute-by-minute control operations required by the Internet of Things. (Think about trying to control a vehicle or adjust a sensitive industrial chemical process from a remote site.)

SAP expects to use a concept called "streaming" to use HANA for analysis without assuming that all data is hot until proven otherwise. Streaming will, essentially, use a HANA instance as a high-speed Hogwarts Sorting Hat to determine whether the data coming in should be stored in another HANA database or sent to a cooler location for future analysis at a somewhat more leisurely pace. This step is the one that's critical for keeping big data analysis moving along at HANA speed -- and the one that can be hardest to understand.

"Streaming" in this context is depositing the incoming data in a HANA database long enough for the sorting algorithms to do their jobs. Once the system gives the data its priority level, it's sent on its merry way, clearing space for the next data bits to stream in. As with all plans of this sort, a great deal of the system's performance depends on buffering, caching, and so on, and it's within those pieces of software and service where some of the rough edges lie. Within the next few months, though, the engineers are fairly confident that they'll be demonstrating full speed and capability of all the connections and translations that need to happen.

Virtual Tables unite data stored in different database structures.
(Image: SAP)

Virtual Tables unite data stored in different database structures.

(Image: SAP)

Components in the SAP big data universe won't begin and end with HANA and Hadoop. Cloudera is one product that the SAP engineers mentioned in many discussions when they talked about mechanisms for connecting a HANA instance to Hadoop. Cloudera is a Hadoop-based data management system that includes the central pieces of Hadoop, software to let data flow between Hadoop and other data sources, and a management structure to tie everything together. Cloudera already has connectors to many of the databases and third-party analytics packages that an organization might want to build into a full big data solution. SAP BI is also likely to be a part of the system since the BI suite already includes many of the analytics bits that will be used to make sense of the data waves rolling in.

Given the SAP IoT customer base, the "things" in question are much more likely to be on an industrial shop floor than attached to a consumer's wrist. The systems controlled will probably be manipulating industrial processes rather than home air conditioning. Even so, the data volumes will be huge, and the time windows in which systems must respond to problems will be small. With HANA, SAP has made a series of key early steps to power their customer and partner plans. If the company can polish and fit the last of the pieces it's promising, SAP S4/HANA will be a formidable presence in the IoT market.

[Did you miss any of the InformationWeek Conference in Las Vegas last month? Don't worry: We have you covered. Check out what our speakers had to say and see tweets from the show. Let's keep the conversation going.]

Curtis Franklin Jr. is Senior Editor at Dark Reading. In this role he focuses on product and technology coverage for the publication. In addition he works on audio and video programming for Dark Reading and contributes to activities at Interop ITX, Black Hat, INsecurity, and ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Curt Franklin
50%
50%
Curt Franklin,
User Rank: Strategist
5/20/2015 | 2:11:55 PM
Re: Queuing and its problems
@SachinEE, the queueing can be a critical issue. I think that the SAP engineers are depending on two factors to get around any problems. First, there's the understanding that the endpoint can be quite patient in waiting to upload the "cold" data after local control issues are handled. Next, SAP has gotten quite good with data compression -- one of the keys to doing in-memory computing on a budget smaller than the GDP of most small European nations.

In one important sense, they're depending heavily on intelligence throughout the infrastructure: Much of the initial heavy lifting in a data sense is happening out at the edge. Fortunately, the latest developments in embedded control systems seem to make this a reasonable strategy.
SachinEE
50%
50%
SachinEE,
User Rank: Ninja
5/20/2015 | 1:51:34 PM
Queuing and its problems
Well I understand how the queuing works but at most when the data is actually being uploaded networks that upoload that data would undergo massive parallel port feeding, i.e. since data is being stored in the form of queues and uploaded in "bursts" it is obvious the networks would work in parallel to actually make that happen. Moreover it would increase network latencies and create more problems when this "queuing device" fails. 
Slideshows
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Commentary
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
News
What's Next: AI and Data Trends for 2020 and Beyond
Jessica Davis, Senior Editor, Enterprise Apps,  12/30/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll