In regards to the pain point of adequate hands-on support expertise for the cloud
I have been on the incoming tech support call side of a medium size VMWare public, private and hybrid cloud vendor for years. The most important aspect of any cloud support is making sure you have the right tools and your people have the right training and critical thinking. Let me elaborate some.
If your cloud tech support group is split into groups that are typical of corporate support infrastructure, then you are going to be inefficient, clumsy and play the typical support group blame game more often than not. You cannot use tradtional groups that have the blinders on, manage their own little world and have no responsibility to the overall IaaS infrastructure. You will usually find Networking, SAN, OS and Hypervisor (VMWare, KVM/Openstack, etc...) support groups, each with their own monitoring and diagnosis tools, and each passing the tech support ticket around.
To be efficient, and address problems in a timely manner, you have to have a multi-discipline team of 3rd level heavy-hitting support techs. Those level of techs are expensive to hire. They need a CCNA level (at minimum) understanding of networking and routing, they need a solid understanding of the SAN infrastructure and its limitations, and more importantly, the tools to graph the performance and IOPS loads. They need to be versed in monitoring and diagnosis of hypervisor issues, and be able to understand the trending and history in the vCenter graphs, as well as the Operating System built-in monitoring tools. They need a medium understanding of Linux and Windows.
Let's talk about tools. To pull together an efficient cloud troubleshooting team, you need multiple performance monitoring and graphing tools, and, they need to provide trending data for up to three months or more. These tools need to be available to the team in the form of tools that can be accessed from the desktop, and tools that are displayed on large TV/monitors like a NOC would use. We use the 42" TV's to monitor the health of our primary data center routers, the overall health of our SAN's and the overall health of our hypervisor clusters and the bare metal they run on. The techs have access to all those tools and more, but with the ability to go very granular in the data they look at, and the ability to look back at historical trend data.
I simply cannot stress enough how important it is to have access to these tools and the data. We each use (at least) three large monitors at our workstations, because we need to see the SAN/Volume performance, Network performance, Hypervisor performance and OS perfomance at the same time to make quick, accurate correlations when running down a problem in the cloud. Some of the tools we use are Orion Solarwinds, Cacti, SAN vendor monitoring/graphing, vCenter, top (linux), task manager (Windows), ESXTOP (hypervisor command line) and an understanding of how to use each one and how to correlate problems seen in one area, to problems cropping up in another area.
Add to this, customer expectations that you know something about databases, IIS, Apache, Exchange, Sharepoint, various CMS systems, Active Directory and firewalls/ports/ACLs, among others, and you can see why it is hard to find cloud troubleshooters that know how to do it right.