Kubernetes Yields 'Operations Dividend,' Still Working On Scalability - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Cloud // Infrastructure as a Service
News
11/11/2015
10:06 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Kubernetes Yields 'Operations Dividend,' Still Working On Scalability

On the second day of KubeCon, developers describe the gains and challenges that are still ahead for the open source container orchestration system, Kubernetes.

10 Top Programming Languages For Learning To Code
10 Top Programming Languages For Learning To Code
(Click image for larger view and slideshow.)

Making developers responsible for how an application runs after it's finished may be a central tenet of DevOps. However, it doesn't necessarily make a lot of sense, according to Joe Beda, the entrepreneur in residence at Accel Partners, and a former Google software engineer who cofounded the Google Compute Engine project.

Beda spoke on Tuesday, Nov. 10, to attendees at KubeCon, a conference in San Francisco for users and developers of the Kubernetes container cluster management system.

Unlike some places, Google doesn't hold developers directly accountable for how their code runs. Rather, the company would rather have someone with development experience who wants to specialize in operations become a site reliability engineer.

Site reliability engineers, or SREs, are responsible for keeping Google Search, Maps, and other production systems running. They're responsible for continuous integration of new code, and other common production tasks. But they're also still programmers.

Instead of working on applications, they work on automating the processes and procedures of the data center to make them more efficient, Beda said.

One of the systems to come out of SRE efforts was the Borg cluster management system, still in use and running Bigtable, the GFS and CFS storage system, and other components of Google operations. As the cofounder of Compute Engine -- Google's Infrastructure-as-a-Service -- Beda said he sometimes ended up in "a few places where I had to run Compute Engine and I did a bad job of it. I worked to get our stuff on Borg clusters. The SREs on Borg had the expertise" to run clusters efficiently.

Another Google engineer, Brendan Burns, who cofounded the Kubernetes project, told attendees on Monday that the system was designed to match newly generated containers with the right resources on a server cluster. Kubernetes uses the concept of pods to put related containers that need to share resources on a single host within a cluster.

(Image: ClaudioVentrella/iStockphoto)

(Image: ClaudioVentrella/iStockphoto)

In his talk, "The Opeations Dividend," Beda described effective DevOps as the place where "the people operating the application are in great communication with the people writing the application." But there's a payoff when effective operations people are given the chance to automate more of their tasks, as SRE's do.

In Beda's view, the "The Operations Dividend" happens when developers and operations people understand the relationship between the right degree of simplicity and operational costs.

"As things get more complex, costs tend to go up," Beda warned. They don't go up linearly as services are added; on the contrary they escalate upward as complexity begins to outstrip the operations staff's ability to understand it.

Breaking Complex Systems Down

Very complex systems need to be broken down into smaller units that are easier to manage, update, and maintain. Too many microservices, however, can also add to costs.

"There's a sweet spot," Beda said, where a set of microservices will run well together and should be treated as a unit, sometimes as a Kubernetes pod, or a set of containers on a single host. In other cases, complex, interrelated services need to be broken apart into more discrete units in order to further the understanding of what they're doing, how they're running, and how they can be fixed when something goes wrong.

When an organization finds the sweet spot for its production systems, it gains a dividend where it needs less physical capacity and sometimes fewer people to keep the data center running. That's the operational dividend that Kubernetes, Docker containers, and Borg clusters yield at Google, and the gain yields more hardware capacity for developers and more software engineers with the time to do development.

More Work Ahead

Bob Wise, chief technologist for cloud infrastructure at Samsung SDS America and head of its Kubernetes consulting practice, told KubeCon attendees Tuesday that Kubernetes helps container managers scale up their resources today, but there's more work needs to be done to make it better at scaling in the future.

Wise has been a leader of the scaling-oriented, K8Scale Special Interest Group of the Kubernetes Project since it was formed after the release of Kubernetes 1.0 last July. The group has met each week since its formation in August, said Wise.

Samsung wants to use Kubernetes at a scale that's still difficult to achieve. "We want really large cluster and lots of sharing [of resources within the cluster]," said Wise. "We want to be the Google infrastructure that's for everybody else."

Tuning Kubernetes clusters, however, "is not going to get us to the goal." Tuning is too piecemeal and can't overcome the barriers that large implementers encounter as they try to get Kubernetes to scale up further. Google has never disclosed at what scale it operates container clusters inside its data centers, but it's suspected of being a primary practitioner of getting Kubernetes to scale.

[Want to learn more about the Kubernetes 1.1 release? See Kubernetes Augments Container Management.]

When Google's Kelsey Hightower, master of ceremonies for KubeCon, asked at one point during the event who in the audience was running the largest Kubernetes cluster in production, the nod went to Jack Foy, engineering manager for Whitepages in Seattle. Foy said he was operating two Kubernetes clusters in production, one of 10 nodes and a second of 25 nodes.

There are larger Kubernetes clusters in research and lab settings, sponsored by the Cloud Native Computing Foundation, but the modest size of Foy's clusters illustrated how far Kubernetes has to go to be a container orchestration system used in production in large scale settings.

Google donated its core Kubernetes code to the foundation last July.

"End-to-end optimizations are where we find the biggest gains," said Wise, but such results mean re-engineering the Kubernetes system to achieve those optimizations, a process that will continue in the K8Scale SIG and the project as a whole.

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
11/12/2015 | 1:25:06 AM
Re: Kubernetes cluster: how it runs
I have some doubts on this - what mechanism can achieve this 100% reliability nowadays?
CharlesB21101
50%
50%
CharlesB21101,
User Rank: Strategist
11/11/2015 | 1:46:21 PM
Kubernetes cluster: how it runs
Whitepages' Jack Foy, when asked in the Expo hall how his Kubernetes cluster was running, said it ran fine. Any downtime? Only when he decided to take it down, he said. No unexpected downtime, he said.
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
Commentary
If DevOps Is So Awesome, Why Is Your Initiative Failing?
Guest Commentary, Guest Commentary,  12/2/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll