SmugMug co-founder Don MacAskill said his firm had actively encouraged Amazon to launch a NoSQL service because it didn't want to maintain its own transactional database. "We have both scaled up and scaled out our database, and every time that came with headaches," MacAskill said during a webcast with Amazon Chief Technology Officer Werner Vogels on Wednesday. "We've always wanted to not have to deal with the database problem, so we could put those resources into improving our product and customer experience."
For those who have reservations about trusting a cloud-based service, it couldn't have helped that Amazon's launch Webinar with Werner Vogels crashed, creating gaps in the audio and video streams and sparking hundreds of embarrassing Tweets with cracks about "scalable and reliable performance"--and there were only 5,000 or so Webinar registrants. (We're not sure if Amazon planning or its third-party Livestream service was to blame, but it was a black eye for DynamoDB either way).
[ Want more on NoSQL databases? Read Hadoop Spurs Big Data Revolution. ]
For a truly Internet-scale business, the real question will be whether big data is something you can do affordably in the cloud. Costs for the service start at $1 per gigabyte per month. Data transfer is free for incoming data. It's also free for the first 10 terabytes per month and between AWS services (like Elastic MapReduce and S3). Once you surpass 10 terabytes, taking data out of the service is $0.12 per gigabyte through 40 terabytes and then lower rates up to 350 terabytes. Throughput capacity is $0.01 per hour for every 10 units of write capacity and $0.01 per hour for every 50 units of read capacity.
Amazon will surely offer volume discounts, but that $1 per gigabyte starting point for storage alone would add up to more than $120,000 per year for 10 terabytes. This ongoing service cost may far surpass the investment in on-premises clusters of open-source software running on commodity hardware run by a crew of experienced administrators. For businesses such as SmugMug, it seems convenience and bandwidth constraints outweigh long-term total-cost-of-ownership concerns.
Cassandra services provider DataStax questions DynamoDB's performance, rather than its cost, in a long blog post full of technical comparisons around operations per second, DynamoDB requirements for Vector clocks and Cassandra support for secondary indexes. "Cassandra offers not only the fully distributed architecture of Dynamo, but also… major performance improvements that scale for big data, and of course, the flexibility for either on-premise or cloud deployments," said DataStax CEO Billy Bosworth in a statement. (Ironically, the cloud-based Cassandra deployment DataStax points to is running in AWS.)
The bottom line is this: if Amazon can court and reliably serve a gaggle of truly high-scale Internet business with DynamoDB, it will change more minds about the wisdom and cost-effectiveness of cloud-based services as a long-term production environment. All the better if those turn out to be high-profile companies who are willing to sing Amazon's praises (and SmugMug and Elsevier ain't a bad start on that score).
There have been Amazon outages before, and some dependent businesses fared better than others, depending on just how much they trusted to Amazon and to what degree that had backup systems available.
When you combine big data, Internet-scale business, and transactional database all in the same sentence, there will be doubters--even among those who are already sold on NoSQL databases.