Amazon's DynamoDB promises a database service fit even for Internet-scale companies with huge data sets. Whether big data players will give up servers comes down to economics and flexibility.
The customers for DynamoDB will likely be Internet-based, services-oriented businesses with little or no appetite and limited capacity to manage and monitor databases. Amazon cited beta customers including Elsevier, the $3 billion scientific publisher, SmugMug, the photo and video sharing site, and Formspring, a social networking site.
SmugMug co-founder Don MacAskill said his firm had actively encouraged Amazon to launch a NoSQL service because it didn't want to maintain its own transactional database. "We have both scaled up and scaled out our database, and every time that came with headaches," MacAskill said during a webcast with Amazon Chief Technology Officer Werner Vogels on Wednesday. "We've always wanted to not have to deal with the database problem, so we could put those resources into improving our product and customer experience."
For those who have reservations about trusting a cloud-based service, it couldn't have helped that Amazon's launch Webinar with Werner Vogels crashed, creating gaps in the audio and video streams and sparking hundreds of embarrassing Tweets with cracks about "scalable and reliable performance"--and there were only 5,000 or so Webinar registrants. (We're not sure if Amazon planning or its third-party Livestream service was to blame, but it was a black eye for DynamoDB either way).
For a truly Internet-scale business, the real question will be whether big data is something you can do affordably in the cloud. Costs for the service start at $1 per gigabyte per month. Data transfer is free for incoming data. It's also free for the first 10 terabytes per month and between AWS services (like Elastic MapReduce and S3). Once you surpass 10 terabytes, taking data out of the service is $0.12 per gigabyte through 40 terabytes and then lower rates up to 350 terabytes. Throughput capacity is $0.01 per hour for every 10 units of write capacity and $0.01 per hour for every 50 units of read capacity.
Amazon will surely offer volume discounts, but that $1 per gigabyte starting point for storage alone would add up to more than $120,000 per year for 10 terabytes. This ongoing service cost may far surpass the investment in on-premises clusters of open-source software running on commodity hardware run by a crew of experienced administrators. For businesses such as SmugMug, it seems convenience and bandwidth constraints outweigh long-term total-cost-of-ownership concerns.
Cassandra services provider DataStax questions DynamoDB's performance, rather than its cost, in a long blog post full of technical comparisons around operations per second, DynamoDB requirements for Vector clocks and Cassandra support for secondary indexes. "Cassandra offers not only the fully distributed architecture of Dynamo, but also… major performance improvements that scale for big data, and of course, the flexibility for either on-premise or cloud deployments," said DataStax CEO Billy Bosworth in a statement. (Ironically, the cloud-based Cassandra deployment DataStax points to is running in AWS.)
The bottom line is this: if Amazon can court and reliably serve a gaggle of truly high-scale Internet business with DynamoDB, it will change more minds about the wisdom and cost-effectiveness of cloud-based services as a long-term production environment. All the better if those turn out to be high-profile companies who are willing to sing Amazon's praises (and SmugMug and Elsevier ain't a bad start on that score).
There have been Amazon outages before, and some dependent businesses fared better than others, depending on just how much they trusted to Amazon and to what degree that had backup systems available.
When you combine big data, Internet-scale business, and transactional database all in the same sentence, there will be doubters--even among those who are already sold on NoSQL databases.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.