Leading Apache Hadoop provider Cloudera made a slew of announcements Tuesday, starting with the release of a new, free tool for monitoring and self-service provisioning of Hadoop clusters in the cloud.
Called Cloudera Director, the tool allows business users to provision and monitor private or public cloud deployments of Hadoop, reportedly without needing IT staff intervention. If it works as advertised, Director will make scalable Hadoop cloud deployments far easier to spin up. The application also includes usage tracking for determining costs and departmental charge-backs.
The new tool represents "over a year of talking to customers" about their needs for data integration, governance, and security in the cloud, Matt Brandwein, Cloudera's director of product marketing, told InformationWeek in a phone call.
[Hadoop? A high-scale relational database? NoSQL? Event-processing technology? Here's how to decide. Big Data: How To Pick Your Platform.]
While most Cloudera customers are using on-premises Hadoop today, the company has seen a lot of experimentation in the cloud. "We want to get ahead of that... We want to have their favorite Hadoop experience waiting for them in the cloud," Brandwein said.
While Cloudera Director is free to download and use, the company expects production deployments of cloud Hadoop (either entirely in the cloud or a hybrid of on-premises and cloud) will step up to a paid Cloudera subscription, which includes an unlimited number of Director seats.
Since its release 18 months ago, Cloudera's Impala, an SQL query engine for Hadoop, has been downloaded more than a million times. The 2.0 release beefs up support for core SQL functions, vendor-specific SQL extensions, and legacy data types.
Also new in Impala 2.0 is the removal of query-size limits. The database now supports queries against physical disks, and so is no longer dependent on RAM size, as was the case in the previous version.
Rounding out the news, Cloudera said it had significantly beefed up the security attributes of Cloudera 5.2. While healthcare, financial services, and government organizations are showing increasing interest in Hadoop databases, they often can't move on to production deployments due to a lack of industry-standard security features.
"Security has been a major area of investment for well over a year now, and we've had the opportunity to speak with hundreds of customers," Brandwein said, adding that without these security features, Hadoop deployments will be forever be relegated to a side role as cut-off "sandboxes."
To address these shortcomings, Cloudera 5.2 brings the following:
- Perimeter security based on Kerberos network authentication
- Role-based access controls for both SQL and non-SQL resources
- Hardware-level security for Intel chips (leveraging the chipmaker's $740 million, 18% investment in Cloudera in March)
- Support for auditing and data-lineage tracking
- Comprehensive encryption, including centralized management of encryption keys (leveraging Cloudera's acquisition of Gazzang in June)
Cloudera made its announcements a day ahead of Strata+Hadoop World's opening in New York. The three-day conference, co-presented by O'Reilly Media and Cloudera, is sold out, with 5,000 attendees expected, nearly double the size of the last year's New York show, according to Cloudera.
What will you use for your big data platform? A high-scale relational database? NoSQL database? Hadoop? Event-processing technology? One size doesn't fit all. Here's how to decide. Get the new Pick Your Platform For Big Data issue of InformationWeek Tech Digest today. (Free registration required.)