There are already clustering systems for MySQL, including MySQL Cluster Carrier Grade Edition, from Oracle. But clustering is designed for high availability more than speed of response. Data is replicated across all nodes of a cluster and changes to data have to be propagated to the nodes. The bigger the cluster, the more latencies are built into the database response.
Another example is the commercial product, Clustrix, which has MySQL compatible features and is sometimes dropped in as a scalable substitute for the open source code, but it is not MySQL itself.
Liran Zelhka, VP of business development, said in an interview that his firm is seeking to allow existing MySQL users to keep their databases and applications as is, but give them an easy way to scale out without making changes. That’s a tall order, one that has eluded MySQL users, who continue to debate in forums and blogs how best to scale their systems. See Morgan Tucker’s, Aug. 6, 2009, MySQL Performance Blog on “Why you don’t want to shard.” Sharding in most cases forces developers to make changes in their applications so that the application is aware that it is dealing with shards or segments of a database, not the whole database itself. If the number of shards changes, they will have to go back and change the application again.
ScaleBase, on the other hand, relies on putting query intelligence into a front-end system that recognizes from a query what part of the database it needs to access. It then guides the query to one of several shards, set up at the direction of the database owner. Each shard resides on its own server with its own section of the total data from the database; some of the frequently accessed data is often stored in RAM on the server; the rest is on the server’s disk.
Query processing thus takes place in a distributed fashion close to the data, an idea borrowed from the NoSQL movement. Furthermore, performance improves as queries are distributed across shards, Zelkha said.
A MySQL application could use ScaleBase 1.0, which became available Monday, to shard a central database into four systems, each with a quarter of the original data, then steer queries appropriate to each shard to the right server. If an employee database for 1,000 employees were divided into four shards, each query should take one-fourth the time they did before, he said.
Under more mixed data sets, a certain amount of load balancing by query interception and distribution is still going on under the covers. If one shard were the target of most queries, however, it would be necessary to divide it in two or take additional steps to ensure performance.
ScaleBase takes advantage of shards in an additional way. A shard may have a copycat slave shard. The primary shard can be used to accept writes or data updates from the MySQL system, while the slave satisfies requests for reads. The splitting of reads and writes “is a good, easy way to scale reads and it shows great performance improvements,” Zelhka said. Zelhka said ScaleBase “can scale MySQL to an unlimited scale,” although the largest number of shards that his company has created is 100, he said.
ScaleBase lists SolarEdge, a solar system equipment supplier, and BuildFax, an online supplier of property histories, as customers. It has been available in pre-release forms for a year and a half. ScaleBase is priced at $1,500 for an annual developer subscription and $5,000 for an enterprise subscription. The firm has venture backing from the Cedar Fund in Boston and headquarters with a development team in Newton, Mass.
You can't afford to keep operating without redundancy for critical systems--but business units must prioritize before IT begins implementation. Also in the new, all-digital InformationWeek SMB supplement: Avoid the direct-attached storage trap. Download it now. (Free registration required.)