OpenStack Juno Offers Automated Hadoop Provisioning

OpenStack Foundation's Juno release features the automated provisioning of three different Hadoop clusters.

Charles Babcock, Editor at Large, Cloud

October 16, 2014

3 Min Read

Techies In Advertising: Hits & Misses

Techies In Advertising: Hits & Misses

Techies In Advertising: Hits & Misses (Click image for larger view and slideshow.)

The Juno release of OpenStack cloud software, available Thursday, will include new big-data handling capabilities and an ability to create different types of object storage as well as virtualized network functions.

Juno is the tenth release of OpenStack as the OpenStack Foundation sticks to its six-month release cycle. Its predecessor Icehouse came out in mid-April, having been preceded six months earlier by the Havana release.

Juno is distinguished by its growing maturity, with implementers and members of its user community making several additions to ease installation, said Jonathan Bryce, executive director of the OpenStack Foundation, in an interview. Out of 310 new features, "the bulk of them are focused on making it easier to build and install OpenStack clouds, perform upgrades, and achieve greater maturity in the software," Bryce said.

A new component for handling big data, called Data Processing Service, takes its place alongside the basic Compute, Networking, and Object Storage components of OpenStack. Eighteen months in the making, it automates the provisioning of clusters on which to run the Hadoop big-data system. The new service supports use of Hadoop from the Apache Software Foundation, Hortonworks, and Cloudera. The component also supports use of Apache Spark data streaming into a Hadoop system.

[Want more on OpenStack's fans? See Why VMware Is Embracing OpenStack.]

OpenStack Object Storage has been given a policy control that allows different pools of objects to be created rather than a simple undifferentiated mass of objects. "We tend to think of object storage as a low-cost way to store massive amounts of data," which is fine for some purposes but can make it more difficult to get different types of performance out of the pooled storage, Bryce said.

{image 1}

The lowest-cost pool might have a policy of maintaining only two copies of the data instead of the usual three in order to lower the cost per gigabyte of storage. Another pool might sit on high-performance disks that assure faster I/O times. With the abilities to set policies over "storage zones," data managers will be able to create sections of object storage for different purposes. Three firms -- SwiftStack, Intel, and Box -- were among the initiators of the policy-based storage zones.

The Juno release also includes a greater ability to impose network function virtualization in an OpenStack cloud. AT&T initiated the feature, with other telcos, including Comcast, Verizon, Time Warner, Telefónica, and Orange, supporting the effort. Telcos invest in six-figure routers and other purpose-built equipment to establish new services, such as text messaging, then must get years of use out of the equipment to make it pay off. By moving such functions off of hardware and onto virtual appliances, they would have a better chance of making service changes more frequently. Enterprises and other builders of OpenStack clouds will be able to make use of the Network Function Virtualization to generate more flexible and modifiable networks, Bryce noted.

In addition, Juno features:

  • The addition of SAML support for use of a federated identity system across more than one OpenStack cloud.

  • The ability to perform live upgrades on running OpenStack code to ease the interuptions and downtime of regular maintenance. The scripts and tools needed to perform the updates are included.

  • An improvement to OpenStack Orchestration that lets it roll back a failed deployment and perform a thorough cleanup.

What will you use for your big-data platform? A high-scale relational database? NoSQL database? Hadoop? Event-processing technology? One size doesn't fit all. Here's how to decide. Get the new Pick Your Platform For Big Data issue of InformationWeek Tech Digest today. (Free registration required.)

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights