Big Data // Software Platforms
News
7/24/2013
01:05 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Cloudera Brings Role-Based Security To Hadoop

Cloudera Sentry brings Hadoop fine-grained data-access controls that financial services, healthcare firms and government agencies have sought.

8 Things You Didn't Know You Could Do With Your iPad And The Cloud
(click image for larger view)
8 Things You Didn't Know You Could Do With Your iPad And The Cloud

The Apache Hadoop big-data platform is still adolescent, but Hadoop distributor Cloudera on Wednesday introduced a maturity milestone in the form of Cloudera Sentry, a new role-based security access control project that will enable companies to set rules for data access down to the level of servers, databases, tables, views and even portions of underlying files.

Hadoop already has provisions for perimeter security, with options including open-source Kerberos, Oozie and Knox for user authentication. But once users are in, what Hadoop has lacked has been a way to define which users have access to what. That has left security-conscious organizations such as banks, insurance companies, healthcare organizations and government agencies with two bad options: tightly restricting access to certain data sets to a select few users or entirely avoiding moving certain types of data onto Hadoop clusters.

With Sentry, Cloudera says it can support four common security requests. First, security administrators can use Sentry to set specific access control privileges for authenticated users. Second, it provides for fine-grained access to subsets of data within files based on defined roles. A fine-grained view might let users see certain columns related to customers while preventing access to their financial information.

[ Want more on Cloudera's fast SQL query option? Read Cloudera Impala Brings SQL Querying To Hadoop. ]

Third, role-based rules can be established whereby a fraud-detection group might get access to financial records whereas a business analyst group would not have access to that information. Finally, Sentry also supports multi-tenant security administration, which enables customers of service providers to set their own security controls without having to go through a higher-level administrator.

"Sentry will enable our customers to store more sensitive data within Hadoop and open up access to information to more users knowing that they have control over more use cases and applications," said Justin Erickson, Cloudera's director of product management, in a phone interview with InformationWeek.

For now, Sentry works with Apache Hive, through HiveServer2, and Cloudera Impala, through a new Impala 1.1 release also announced Wednesday. Cloudera plans to go beyond Hive and Impala to extend security controls to other components of the Hadoop framework, according to Erickson. Hive and Impala were chosen as a starting point because they support SQL-style access to data, but directly by users and through business intelligence applications and ETL tools.

Hive is a well-established open-source query infrastructure that runs on top of Hadoop, but it's notoriously slow because it relies on MapReduce processing running behind the scenes. Impala is a Cloudera-developed, SQL-on-Hadoop component that supports direct querying of data in the Hadoop Distributed File System (HDFS) and HBase (NoSQL database) indexes. Cloudera says Impala querying is three to 30 times faster than Hive.

Cloudera has contributed Impala to the open-source community, but it's the only vendor likely to support it. For one thing, management and monitoring of Impala queries is something you do through Cloudera's subscription-based commercial management console. For another, all of Cloudera's rivals have introduced or are working on their own SQL-on-Hadoop tools. The list includes Hortonworks-supported Stinger, MapR-supported Drill, Pivotal's proprietary HAWQ engine and IBM-supported BigSQL.

Cloudera said Sentry, too, will be contributed to the open-source community and will be an Apache-licensed project. Cloudera isn't the only vendor working on Hadoop Security, but this is an area where a consistent approach across all vendors will be crucial to Hadoop's long-term success. Hortonworks, Cloudera's biggest rival, could not be reached in time for comment.

Comment  | 
Print  | 
More Insights
White Papers
More White Papers
Comments
Newest First  |  Oldest First  |  Threaded View
jaysimmons
50%
50%
jaysimmons,
User Rank: Apprentice
8/12/2013 | 10:18:28 PM
re: Cloudera Brings Role-Based Security To Hadoop
Technology such as the Sentry technology from Cloudera is crucial when it comes to data management and security, and I am actually kind of surprised that the Hadoop clusters havenG«÷t had these kinds of security access applications. Having the options to assign role-based security privileges to information will be beneficial to all Hadoop users and administrators and should contribute to their growing popularity.

Jay Simmons
Information Week Contributor
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
7/24/2013 | 9:57:41 PM
re: Cloudera Brings Role-Based Security To Hadoop
After press time, Hortonworks offered the following statement about Sentry from Shaun Connolly, VP of Corporate Strategy at Hortonworks:

"The capabilities that Cloudera is targeting make sense and are valuable. Since Cloudera Sentry (previously called Cloudera Access Server) plugs into HiveServer2, including it into the Apache Hive project would make logical sense. With that said, by
separating this work from Apache Hive, Cloudera is introducing a new authorization
model for ClouderaG«™not 'for Hadoop.' Unfortunately Sentry's broader community benefit may be limited. Hortonworks engineers working within the Apache Hive community are open to working with Cloudera on integrating these capabilities directly into the Apache Hive project."

See my concluding comments above about a single approach to security being something that would be for the good of all Hadoop users and distributors.
In A Fever For Big Data
In A Fever For Big Data
Healthcare orgs are relentlessly accumulating data, and a growing array of tools are becoming available to manage it.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A UBM Tech Radio episode on the changing economics of Flash storage used in data tiering -- sponsored by Dell.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.