In a column a few days ago, I questioned the value of infrastructure-as-a-service offerings based on their lack of adherence to Moore's Law. My thesis: While CPU performance and drive storage capacity continue to climb at logarithmic rates, IaaS vendors aren't providing those implied cost savings back to their customers. I received two sorts of responses to that column: those thankful for the oversimplified example I provided; and others wanting more concrete numbers applied to real systems.
I took some time to do a back-of-the-napkin calculation for storage, and I'll share my results here. Before jumping into the numbers, however, it's important to know that it's pretty much impossible to do an apples-to-apples comparison between 2006 IaaS prices (the year Amazon first offered EC2 and S3) and 2012 prices. Sure, for storage systems you can compare drive capacity, but that's not the full story. An iSCSI drive array in 2006 would typically come with two to four Gigabit Ethernet adapters, while today you'll get a few 10-Gbps Ethernet adapters. You'll also get six years of advances in firmware and software. So let me say right from the start: This not only isn't an apples-to-apples comparison, but you probably don't want one.
What we want to understand are the relative improvements in cost, performance, and reliability that you got from IaaS vendors over six years compared to the improvements you'd get from buying systems the old fashioned way and running them yourself. For no other reason than convenience, I chose to compare storage prices. I was able to find some good historical data that I think makes for a compelling comparison. I decided to compare Amazon's S3 prices from 2006 until now with the prices of a hard drive and an actual storage array over the same period.
I threw in the storage array because while the price of a hard drive is obviously going to change radically over six years, the price of other storage system components won't change that much. Power supplies and other hardware don't adhere to Moore's Law, and certainly there are significant costs in developing firmware and software for drive arrays that also don't drop logarithmically. So it would be fair to expect that drive prices would change the most, followed by array prices, followed by the price of the Amazon offering, which must take into account other overhead required to run the storage system. The relative magnitudes of the differences are what's important and telling, and that's what we want to understand.
Since quantity makes a difference, we'll assume that we're looking at storing 50 terabytes of data, and that we'll look at the total cost over four years. This is back-of-the-napkin; we know there are lots of costs I'm not including in that four-year number, including amortization, failed drives, additional hardware requirements, maintenance contracts, and the time value of money. A more detailed analysis is critical for a buying decision, but I think we can illustrate some fundamentals without hauling out a spreadsheet (if anyone wants to do that, please do, and I'll post it and give you credit for the work).
[ Want to know more? Get our deep dive report on cloud ROI! ]
Once you find the historical data, both the Amazon and raw disk calculation are pretty easy to do. For Amazon, the S3 2006 price was $0.15 per gigabyte per month, so the total cost for 50 TB for four years--assuming a contract with no clause for reduced price over its term--is $360,000. This year, the Amazon price per gigabyte is down to $0.108 per gigabyte per month, so a similar four-year contract for 50 TB would now be $259,200. So it cost 39% more in 2006 to store the 50 TB than it does now. Note that we haven't calculated any fees for using data or retrieving it--just for storing it. We'll get to other fees later. Nonetheless, Amazon is lowering prices, which seems like a good thing.
Raw hard drive costs certainly have dropped radically. In 2006, a Seagate Barracuda 7200 RPM 500-GB drive would run you about $300. For 50 TB, you'd need 100 of them, so $30,000. Today, a 2-TB Seagate Barracuda costs $120. You'll need 25 of them to get you to 50 TB, so that's $3,000.
Note that you're just buying the raw capacity here. If you opt for RAID 10, you'll need double the number of drives, while for RAID 6 you'll need 25% more. The price change over the six years remains the same though. As these numbers show, the 2006 price is 10 times the current price. Surprising, right? Is it reasonable that Amazon is passing along only 39 points of the 1,000-point cost reduction?
Let's see how the drive array pricing works out.
First, this exercise is far more subjective and data for it is harder to find. Here's what I found: In 2006, EqualLogic released its PS3000X line. Among other things, it was the first EqualLogic product to use serial-attached SCSI drives. Loaded with 16 300-GB drives spinning at 10,000 RPM, the system had a maximum configuration of 4.8 TB. However, this is raw capacity. After applying RAID 6, the usable capacity is 3.5 TB. The only reference I could find to its price was from the very reliable Register, so after a quick conversion from 2006 British pounds to U.S. dollars, that configuration works out to $76,900. You need about 14.4 of these to get you to 50 TB, so the cost is $1,110,000. So at least in 2006, Amazon's service was looking like a pretty good deal for four years. After all, Amazon Web Services is doing backups and guaranteeing 99.9% availability.