Amazon DynamoDB: Big Data's Big Cloud Moment
Amazon's DynamoDB promises a database service fit even for Internet-scale companies with huge data sets. Whether big data players will give up servers comes down to economics and flexibility.
Amazon Web Services launched the DynamoDB NoSQL database service on Wednesday, marking the company's boldest appeal yet for companies to trust a cloud-based service as a platform for running a business.
The question now is whether Amazon really can deliver big-data scale at a reasonable cost through a cloud-based operation.
The DynamoDB service is akin to giving Amazon's 3-year-old SimpleDB service a massive boost in terms of capacity and performance. That's because this new service is based on Dynamo, the NoSQL database Amazon developed internally and has used to run parts of its massive consumer commerce Web site since 2007.
A white paper about Dynamo published by Amazon inspired open-source NoSQL products including Apache Cassandra, Riak, Voldemort and others. Cassandra is now running high-scale Web businesses including Netflix, Expedia and Twitter.
[ Want more on NoSQL databases? Read Hadoop Spurs Big Data Revolution. ]
What all these NoSQL products have in common is incredibly high scalability and flexibility to react to changing demands for the information held in these databases. The scalability is tied to distributed processing across tens, hundreds or even thousands of servers (tens of thousands in Amazon's case). The flexibility stems from the fact that data in a NoSQL database doesn't have to conform to a predefined schema or data model.
Scalability and flexibility requirements go hand in hand for big Internet operations, like Amazon's retail operation, because rigid data modeling requirements become too cumbersome when you're running a rapidly changing business and you need to start adding new data or change the way your work with current data.
But the key appeal of Amazon DynamoDB is that it's a managed service. When you need more database capacity, you don't have the burdens of provisioning and configuring hardware, operating and scaling distributed databases, patching software, partitioning databases and then scaling up the hardware platform. You can dial up and dial down the capacity as needed in flexible cloud fashion, and you can also choose between peak performance or, if latency isn't crucial, "eventual consistency" (meaning possible processing delays of a few seconds" at a lower cost.
DynamoDB raises the bar for Amazon because it's about big data, and in this case it's not taking about running small-scale sandbox environments for development and testing. Nor is it talking about handling only the overflow from spikey on-premises demands. They're saying "trust us to run the core of a Web-scale business."
Another appeal of DynamoDB is that it's closely tied to Amazon's Hadoop-based Elastic MapReduce service. Customers using DynamoDB will be able to use data from within DynamoDB (and the AWS S3 storage service) in MapReduce processes and downstream analytic queries, something many Internet-scale businesses are routinely doing or anticipating doing as their businesses scale up. So it's a single environment with scalable transactional processing, scalable storage, and scalable MapReduce and analytic capabilities.
To put DynamoDB in perspective, keep in mind that the potential customer base here--as of today--is a niche of a niche. As you can read in our just-released InformationWeek "State of Database Technology" report, fully 60% of the 760 technology professionals surveyed have "never heard of or had no interest in" transactional NoSQL databases. Only 4% have actual hands-on experience with NoSQL databases (2% in pilot stage and 2% in production) while 36% say they are investigating the technology.
In a separate question about cloud computing, 55% of those 760 respondents say they have no plans to use off-premises or cloud-hosted services for their primary transactional database; another 29% are researching that possibility. Among the 12% that are actually taking the plunge, 5% are piloting, another 5% are using the cloud to "host the environment but are managing the database themselves," and 2% are "using the cloud for a fully managed database service," a description that matches what Amazon is providing with DynamoDB.
Others see potential here as well: Oracle recently launched an Oracle Big Data Appliance that includes a NoSQL database.
However, if you narrow it down to just those interested in using NoSQL and then those willing to put it in the cloud, it's not a mainstream offering. I'd guess we're talking about thousands of companies and tens of thousands of developers as prospects, versus the hundreds of thousands of companies and millions of developers who use conventional databases on premises.
That could change quickly. Amazon will no doubt attract thousands of tire kickers, particularly since it's making available a free DynamoDB tier that provides 100 megabytes of storage, and five writes and ten reads per second (with up to 40 million requests per month).