CERN Project Will Collect Hundreds Of Petabytes Of Data

sidebar story to "Tower Of Power," 2/11/2002, InformationWeek.com

InformationWeek Staff, Contributor

February 8, 2002

2 Min Read

Near the Franco-Swiss border west of Geneva, Switzerland, CERN, the European Organization for Nuclear Research, is constructing a particle accelerator that scientists hope will give them new insights into the structure of matter. When the Large Hadron Collider begins operating in 2006, it will generate between 5 and 20 petabytes of raw data each year, for a total of hundreds of petabytes of data in the collider's projected lifetime of 10 to 15 years.

CERN is designing a data warehouse to store all that information. The organization has created a prototype system with several hundred terabytes of simulation and test data stored in an object database from Objectivity Inc. That database is expected to reach 1 petabyte by 2004, says Jamie Shiers, a database group leader at CERN.

The organization is still mulling over many details about the collider database. What it does know is that the system will run on Intel servers. The test system has about 1,000 dual-processor servers incorporating IA-32 microprocessors, soon to be upgraded to IA-64 microprocessors, running Linux. "Our budget is very tight," Shiers says when asked about the reason for using the low-cost, open-source operating system.

The CERN development team is considering using the Oracle9i database for the data warehouse. The Objectivity software is something of a standard among physics labs: The Stanford Linear Accelerator Center at Stanford University also uses Objectivity. But CERN is considering Oracle for support reasons, since Oracle has extensive European operations and Objectivity is far away in Mountain View, Calif., Shiers says.

Early tests using Oracle's Real Application Clusters clustering technology have had "encouraging results," Shiers says, although CERN hasn't decided on what clustering technology to use.

CERN IT staffers also are debating how much data to store on tape and how much on disk. One possible approach is to keep a month's worth of data on disk for quick access and archive the rest of it on tape. Shiers says that decision will hinge on balancing the cost against data-access patterns.

Close this window

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights