The purpose of big data may be simple, but the practice of it is complex. Hadoop distribution company MapR took another step towards delivering simplicity by adding enhanced support for containers to its Converged Data Platform. It has also made several improvements in security and governance.
"We're making it easier to spin up resources. We're making it easier for them to be transferable," said Jack Norris, senior VP, data and applications, in an interview with InformationWeek.
MapR's enhanced support improves how containers interact with persistent storage by making it "stateful." Stateful means the context for the data. For example, the context could be the sales history of a customer. All the points of information that make up that history make up the context. The data had to be persistent for the app to retrieve it.
Moving the app, via a Docker container, to another patch of data will sever that connection between the data and the app, and require coding to rebuild the connection. MapR's change is to use a MapR cluster as a data services layer to sit on top of various data stores. When a Docker container is redeployed, it makes its connection to the new data store, Norris explained.
On the security front, Converged Data Platform will now rely on Access Control Expressions (ACE) to delineate which users have access to which data. The improvement here is that it takes a systems administrator one or two lines of code to specify access, whereas under the previous Access Control List, it would take many more lines of code to spell out the same thing, Norris pointed out.
ACE can also be extended to the volume level, again specifying who has access to which volume of data. This provides a second layer of access security, and is ideal in multi-tenant situations, Norris said.
Added to this is selective auditing, which can limit auditing to those activities specified by the user. That in turn frees up resources that otherwise would have been consumed doing a larger audit of all users, according to the MapR release.
Another feature is data job and placement control. Users can now configure some nodes in the Hadoop cluster for "high-end" processing jobs requiring more memory and processing power, Norris said.
MapR-DB has been upgraded to include native JSON support. A client can now optimize performance accessing data held within solid state drives (SSDs) via parallel I/O access to NoSQL data. This can reach speeds of 3.5 GB per second throughput, handling 18 million messages per second, MapR said in a prepared statement.
Finally, the Converged Data Platform will be getting Apache Myriad, which allows Hadoop YARN and Apache Mesos to operate side-by-side while sharing resources. "We're integrating Apache Mesos and YARN together to provide … centralized support for Hadoop," Norris said. Data resources can be shared between jobs requiring YARN and those that don't.
"The mark of a great architecture or implementation is that it delivers simplicity," Norris said. "We need this to be simple, [so that] we don't have to think about the data."