Interest in database as a service -- such as Amazon's MySQL re-architected for the cloud, Aurora database service -- is increasing, and Microsoft has jumped into the act with its own relational service, known as Azure SQL.
But cloud operations in themselves will impose new problems on database response times and scalability.
When it comes to being "the white knight of enterprise IT needs… cloud services don't free IT from all concerns about availability and performance, despite marketing to the contrary," wrote the database experts at ScaleArc, a supplier of database load balancing software.
The P in PaaS "does not stand for panacea… In fact, the cloud introduces shortcomings and inefficiencies that can undermine performance and jeopardize uptime," when it comes to database applications, the authors said.
"Millions of users have experienced application lag, data loss, and outages arising from database service limitations baked into platform infrastructures," warns the white paper, A Hazy Horizon: Why the Cloud Doesn’t Solve All Your Uptime and Performance Challenges.
In effect, ScaleArc argues that you need load balancing middleware between you and the cloud database service for it to perform as expected. In the process of doing so, it highlights the chief obstacles to achieving performance and availability with cloud database services. They include: network latency, I/O limitations, scalability, and hypervisor challenges, as well as availability issues.
Want to learn more about Amazon's Aurora and other database services? See AWS Expands Database Migration Services, Expands Replication.
In that order, here's what ScaleArc's experts had to say about each:
Network latency: A cloud database server to some extent is only as good as its proximity. How far away is the cloud data center with the database server? The server and its storage could be across town on a high-speed fiber optic loop or they could be hundreds of miles away. "Enterprises have no control over the number of or distance between their network hops," the authors warned.
Cloud services with a data center in a region near you are more likely to offer the lowest latencies due to network delays imposed by distance. By 2025, there will be 485 cloud centers in the world, according to Cisco's Global Cloud Index, so chances are one is coming to a location near you, if there isn't one available already.
In the meantime, network hop latencies impose delays that can cause a database system in the process of updating synchronized data to malfunction. If network latency passes a tolerable threshold, "multiple reconnect attempts may ensue. For some applications, this step might require re-authenticating to the server with each connection attempt." When that happens, kiss effective user response time goodbye.
If the database system is synchronizing with a copy in a different geographical region, that synchronization may be speedy or slow, depending on the subsystem's operation and network connection. If the lag exceeds the customer's "threshold for freshness of data," the primary system needs some way of deciding whether there is another replication point available in a better timeframe, the authors said.
I/O limitations: Not everyone realizes when they sign up for a cloud database service that there will be limitations on their number of I/Os. "The more highly shared the resources, the worse the issues become," said the ScaleArc experts, especially if the service provider makes no effort to police noisy neighbors that create a lot of I/O traffic. If the cloud provider is trying to spread use of existing resources across more customers, yielding more profit, it will contribute to the problem.
If the service provider is a favorite of those relying on database replication and disaster recovery in the cloud, then the I/O problem will more easily surface. Indeed with the growing use of solid state memory as storage for database apps, database managers "will chafe at the IOPS limitations of cloud-based operation," they said.
Enterprise IT wants to be able to scale cloud database service linearly, just as its database applications do on premises by invoking a large server. But there are no large servers in the cloud, just distributed server clusters. IT can create a server cluster easily enough in the cloud but will its database application know how to make use of it? In many cases, shifting off a larger server to a server cluster "creates disruption at the application tier," the ScaleArc writers warn.
If the application can be re-architected to recognize, say, that a second read server has become available to it, its performance can be improved 40% - 50%, since reads make up 80% of most database application's workload. The offloaded reads will also generate more server capacity to execute writes, which make up 20% of the workload, another contributor to improved performance, they said.
Scalability: Cloud architecture is scale-out, relying on clusters of small servers, rather than scaling up using a big server. The picture is further complicated by the fact that the small servers are actually virtual machines or a fraction of an x86 server. "The challenge, then, becomes one of harnessing these smaller instances," the authors wrote. The cloud customer needs to check and see that such load balancing is available with the cloud database service. If not, "such is typically not the case," the authors said, then the customer must set it up himself. "Platforms such as AWS RDS and Microsoft Azure SQL DB provide features to create readable secondary databases, but the application code still needs to be re-written to utilize those readable secondaries," they added as a second trouble point. Load balancing middleware placed in front of the small database servers can make them appear as a single database server, without needing to re-architect the applications.
Hypervisor challenges: The use of virtual machines in the cloud imposes their own claims on database service performance. Any given application or database server running in a hypervisor typically sits in a multi-tenant environment, with different VMs competing for the same set of resources. Moving VMs off a busy server to one that is less busy can interrupt application performance, the ScaleArc authors noted. So does the cloud provider's planned downtime for maintenance, something a running application doesn't anticipate. Servers are taken offline to be given maintenance changes and updates, with a service provider substituting a secondary server in their place. Such a move requires "all clients are disconnected and then reconnected (to the secondary), creating substantial application downtime," the authors said. The process can take "many minutes, meaning users are getting lots of app error messages – the app will often hang or need to be restarted…." according to the their white paper.
Availability: Cloud suppliers offer guidance on how to avoid an outage, even in the event of a regional data center failure, a point the white paper merely touches upon. When it came to availability, it emphasized the need to be able to perform a database system failover that doesn't impact users, not just address continuous operation of the database application. ScaleArc and other database front end specialists can offer an alternative solution on that front that builds a transaction queue as the failover occurs and an ability to drain the queue in order to maintain database integrity as a secondary system takes over. Third-party front end providers can also ease the pain of planned maintenance, which can cause a customer system to fail unexpectedly under infrastructure as a service. There are several ways to accomplish the goal, but high availability can come with more guarantees when an intermediary works with a provider across many customers to provide the right solution as an outage, planned or unplanned, occurs, the authors said.