“It’s all over the place…”
Time was you’d hear that phrase and think that things were disorganized, unstructured, scattered, bad.
In the era of the cloud, that concept has been turned on its head. Now applications, workloads, metadata, and more are distributed to multiple servers in multiple data centers in multiple geographies. In other words, “it’s all over the place!” And that’s a GOOD thing.
Organizations of all sizes have chosen to deploy OpenStack for a flexible cloud environment that’s built to scale out efficiently “all over the place” on commodity hardware. And while OpenStack has become the preference of many that are transitioning to cloud computing, open-source-based Ceph storage has become the preference of those deploying OpenStack. In fact, the October 2016 OpenStack.org user survey shows that Ceph has garnered nearly 60% of the OpenStack user base for block storage.
But Ceph storage is not just about block. Ceph accommodates massive scale on standard commodity hardware, provides a unified platform for block, object, and shared file system data, and integrates tightly to OpenStack’s services. It’s a far better option than traditional storage appliances built on proprietary hardware with embedded proprietary software. That’s like buying the ultimate driving machine and putting square tires on it. The best engine available becomes hobbled. Worse, you’re running it on gasoline that’s only available from one station, so you must drive there every time you need more fuel.
Everyday Tasks Need Infrastructure Support
Part of the value of Ceph’s tight integration with OpenStack is its ability to accommodate everyday tasks of OpenStack users without square tires. It’s totally fundamental to Ceph, but OpenStack users often ignore the distinct possibility that their storage infrastructure isn’t up to the task.
One such task is the creation of virtual machines. OpenStack users make copies of VMs frequently -- not because they’re afraid to lose them but because they want to use them quickly, such as to template their applications so they can reuse them more efficiently. Fortunately, the very nature of Ceph’s architecture addresses this requirement because data and clones of data are automatically distributed “all over the place.”
In the ball and chain, square-tired ultimate driving machine example, these clones are not distributed, instead creating a resource bottleneck that hampers their usefulness until manually copied and strategically placed by the OpenStack developer. What makes Ceph different?
CRUSH: Adding VM Scale To Scale-Out Storage
Imagine if all cable TV services were designed around the paradigm that viewers select a program that is then downloaded to their set-top-boxes and played. Every viewing of every program would require wait time at the start.
That’s not very different from the way most virtual machines are cloned and distributed across a network. The server running the hypervisor is instructed to make a duplicate of one of the VMs contained in its storage. It makes that copy, which takes a few minutes, then it uploads that copy across the network, which takes a few more minutes. It’s bearable if it’s just one copy, but what if you need 1,000 copies? Suddenly those minutes turn into hours, days, or even weeks.
Similarly, many scale-out architectures require a lookup or metadata server to allow nodes to access required data. The node first talks to the lookup server, which tells it where the data resides on the cluster. Then the node retrieves the data. This can be slow and cumbersome. Ceph instead uses a novel data placement algorithm called CRUSH -- Controlled Replication Under Scalable Hashing. Each OpenStack compute node runs CRUSH, which computes a consistent, reliable place where the required data is stored. The node goes directly to the data, eliminating the need and the latency introduced by centralized controllers and lookup servers.
The storage device is spread redundantly across the entire cluster, replicating data repeatedly across distributed nodes to ensure reliability. If a node is removed, the cluster retrieves the lost data from other nodes and redistributes it among the remaining nodes. This provides extraordinary data storage scalability -- thousands of client hosts or KVMs accessing petabytes to exabytes of data.
Each one of your applications can use the object, block, or file system interfaces to the same cluster simultaneously, which means your Ceph storage system serves as a flexible foundation for all of your data storage needs.
The copy-on-write clone feature of Ceph storage (shown above) helps OpenStack spin thousands of virtual machines on the fly, in probably less time than it takes to brew your next cup of coffee. Clones can come from a golden image stored in Glance, a running image in Nova, or an existing block device in Cinder. In all instances, the data is cloned instantaneously in Ceph, allowing the VM to have immediate access to it. Now when you request distribution of 1,000 copies of a VM for use on 1,000 servers, the image is instantly cloned on the cluster and made available immediately.
You can also take a snapshot of any clone, which can then be used to keep copies of the point-in-time status of your VMs. These snapshots can also be stored back into Glance for later booting of new VMs. Snapshot layering enables these images to be created quickly and transparently, again with no data moving the network.
No waiting time.
No reconfiguration of a variety of servers and hypervisors.
All nodes access the cluster simultaneously.
Thousands of VMs spin up instantly and effectively.
Just as you’d pair a fine wine with haute cuisine, pair the right storage solution with OpenStack. Those whom you serve will appreciate the menu.Sean is a seasoned product manager with more than 15 years of experience in senior engineering, global operations, and services management roles at virtualization and cloud companies. He has international experience with storage virtualization products delivery and private ... View Full Bio