CERN Project Will Collect Hundreds Of Petabytes Of Data
Near the Franco-Swiss border west of Geneva, Switzerland, CERN, the European Organization for Nuclear Research, is constructing a particle accelerator that scientists hope will give them new insights into the structure of matter. When the Large Hadron Collider begins operating in 2006, it will generate between 5 and 20 petabytes of raw data each year, for a total of hundreds of petabytes of data in the collider's projected lifetime of 10 to 15 years.
CERN is designing a data warehouse to store all that information. The organization has created a prototype system with several hundred terabytes of simulation and test data stored in an object database from Objectivity Inc. That database is expected to reach 1 petabyte by 2004, says Jamie Shiers, a database group leader at CERN.
The organization is still mulling over many details about the collider database. What it does know is that the system will run on Intel servers. The test system has about 1,000 dual-processor servers incorporating IA-32 microprocessors, soon to be upgraded to IA-64 microprocessors, running Linux. "Our budget is very tight," Shiers says when asked about the reason for using the low-cost, open-source operating system.
The CERN development team is considering using the Oracle9i database for the data warehouse. The Objectivity software is something of a standard among physics labs: The Stanford Linear Accelerator Center at Stanford University also uses Objectivity. But CERN is considering Oracle for support reasons, since Oracle has extensive European operations and Objectivity is far away in Mountain View, Calif., Shiers says.
Early tests using Oracle's Real Application Clusters clustering technology have had "encouraging results," Shiers says, although CERN hasn't decided on what clustering technology to use.
CERN IT staffers also are debating how much data to store on tape and how much on disk. One possible approach is to keep a month's worth of data on disk for quick access and archive the rest of it on tape. Shiers says that decision will hinge on balancing the cost against data-access patterns.
5 Top Federal Initiatives For 2015As InformationWeek Government readers were busy firming up their fiscal year 2015 budgets, we asked them to rate more than 30 IT initiatives in terms of importance and current leadership focus. No surprise, among more than 30 options, security is No. 1. After that, things get less predictable.