The database at the Stanford Linear Accelerator has reached the 500 terabyte mark, roughly equivalent to a billion books, or 60 times the content of the U.S. Library of Congress. It's not a magic number, says Jacek Becla, database group manager for the accelerator, but it is a marker--and it won't stand long. "Within a year or so, we expect it will reach a petabyte."
The nearly unimaginable amount of data is generated by the BABAR experiment, a collaborative experiment involving 600 physicists from nine countries. They're smashing subatomic particles in the mile-long supercollider to create anti-matter and observe its behavior with matter. By using a 1,000-ton particle detector to observe the collision of these submicroscopic particles, scientists hope to gain a better understanding of how the universe was formed, and perhaps even to answer one of nature's biggest mysteries: why the universe is dominated by matter over anti-matter.
When Becla's team began the project in 1998 with $177 million from the Department of Energy, it opted for object-oriented database technology over more ubiquitous relational database systems. But because relational databases are pretty much the business standard, there are relatively few experts in object-oriented database technology. Companies aren't really looking at object-oriented database technology, even though Becla says it's more scalable, partly because they've already chosen another standard. And it's pretty daunting to consider that of the 5 million lines of C++ BABAR's code in the accelerator's database, half a million are dedicated to customizing the object-oriented database engine.
While it takes 2,000 CPUs and 100 servers to support the system, not many people are needed--three developers and three database administrators handle the load.
Five hundred terabytes is much larger than most businesses need now but that will soon change, contends Becla, whose group monitors large database systems. Most are less than a hundredth the size of this one, but he thinks many businesses will eventually need superdata storage. With bio-informatics, streaming video, and other large applications, he says, "it will happen sooner than they think."