There's no Moore's law to sum up the growth curve of databases. But here's a rule of thumb: The amount of data stored by businesses nearly doubles every 12 to 18 months. And the very biggest--those at or near the 100-terabyte mark--probably triple every three years.
But databases aren't just getting bigger. They're also becoming more real time. Wal-Mart Stores Inc. refreshes sales data hourly, adding a billion rows of data a day, allowing more complex searches. EBay Inc. lets insiders search auction data over short time periods to get deeper insight into what affects customer behavior. Data also is coming from increasingly complex sources: Radio-frequency identification readers now feed data to Wal-Mart, and Nielsen Media Research, in collecting info on TV-viewing habits, is getting data from TiVos along with the standard living-room set.
Businesses don't run the biggest databases in the world. That honor is reserved for the Stanford Linear Accelerator Center, NASA's Ames Research Center, and other government groups such as the National Security Agency, which run databases in the petabyte (1,000-terabyte) range. But because businesses run fast-response systems that need to quickly get data in and answers out, they're solving some of the most interesting problems in data management.
Businesses are dealing with the complexities of engineering databases that combine historical and real-time data from multiple sources. Designing and building the hundreds, even thousands, of tables that make up multiterabyte databases and the queries used to extract useful knowledge can test the technical and management skills of any database administrator. But the advantages of big databases are obvious: Most of the largest are data warehouses for analytical tasks where more, and more-detailed, data means better insights. With real-time or near-real-time data, the value of those insights increases exponentially. "We know how many 2.4-ounce tubes of toothpaste sold yesterday, and what was sold with them," says Dan Phillips, Wal-Mart's VP of information systems.
Business As Usual At Wal-Mart
No company better illustrates the advantages of leveraging massive volumes of data for competitive advantage than Wal-Mart, which operates a data warehouse with, at last count, 583 terabytes of sales and inventory data built on a massively parallel 1,000-processor system from data-warehouse-technology vendor Teradata, an NCR Corp. subsidiary. While some companies might consider having more than half a petabyte of data overkill, at Wal-Mart it's the way to do business.
"Our database grows because we capture data on every item, for every customer, for every store, every day," Phillips says. Wal-Mart deletes data after two years and doesn't track individual customer purchases, he says.
By refreshing the information its data warehouse holds every hour--1 billion rows of data or more are updated every day--Wal-Mart turned its data warehouse into an operational system for managing daily store operations. Store managers used to query the database at the end of the day to see what was selling at their location. Now they can check hourly and see what's happening at stores throughout a region that might be experiencing an unusual event such as a snowstorm or hurricane.
Phillips tells the story of how IT staff at Wal-Mart's Bentonville, Ark., headquarters tapped into the data warehouse the morning after Thanksgiving three years ago and noticed that East Coast sales of a computer-monitor holiday special were far below expectations. Marketing staff contacted stores and learned the computers and monitors weren't being displayed together, so potential buyers couldn't see what they were getting for the posted price. Calls went out to Wal-Mart stores across the country to rearrange the displays. "By 9:30 a.m. Central, the pace of sales could be seen picking up in our data," Phillips recalls.
Blurring The Lines
Data's usefulness is rarely so clear cut. And Wal-Mart's capabilities are beyond the scope of most businesses. But its reliance on data for day-to-day business decisions is being emulated elsewhere, particularly in retail, telecommunications, financial services, and manufacturing.
The dividing line between operational and historical data isn't as firmly drawn as just a few years ago, says Bill O'Connell, chief technology officer of IBM's data-warehouse and business-intelligence business. "You're seeing a blurring of the lines between operational and strategic systems," he says. But that means the two must be carefully engineered to work together, which complicates the life of the database administrator even more.
EBay learned a big-database lesson or two as it rapidly grew into the world's largest online auction house. "We started in 1999 and 2000 with one monolithic Oracle database," says David Pride, VP of information management and delivery. "Since then, we've done a series of splits that let us scale out horizontally" into several hundred databases totaling 100 terabytes of data.