Feature
News
2/8/2002
09:46 AM
Connect Directly
RSS
E-Mail
50%
50%

CERN Project Will Collect Hundreds Of Petabytes Of Data

Near the Franco-Swiss border west of Geneva, Switzerland, CERN, the European Organization for Nuclear Research, is constructing a particle accelerator that scientists hope will give them new insights into the structure of matter. When the Large Hadron Collider begins operating in 2006, it will generate between 5 and 20 petabytes of raw data each year, for a total of hundreds of petabytes of data in the collider's projected lifetime of 10 to 15 years.

CERN is designing a data warehouse to store all that information. The organization has created a prototype system with several hundred terabytes of simulation and test data stored in an object database from Objectivity Inc. That database is expected to reach 1 petabyte by 2004, says Jamie Shiers, a database group leader at CERN.

The organization is still mulling over many details about the collider database. What it does know is that the system will run on Intel servers. The test system has about 1,000 dual-processor servers incorporating IA-32 microprocessors, soon to be upgraded to IA-64 microprocessors, running Linux. "Our budget is very tight," Shiers says when asked about the reason for using the low-cost, open-source operating system.

The CERN development team is considering using the Oracle9i database for the data warehouse. The Objectivity software is something of a standard among physics labs: The Stanford Linear Accelerator Center at Stanford University also uses Objectivity. But CERN is considering Oracle for support reasons, since Oracle has extensive European operations and Objectivity is far away in Mountain View, Calif., Shiers says.

Early tests using Oracle's Real Application Clusters clustering technology have had "encouraging results," Shiers says, although CERN hasn't decided on what clustering technology to use.

CERN IT staffers also are debating how much data to store on tape and how much on disk. One possible approach is to keep a month's worth of data on disk for quick access and archive the rest of it on tape. Shiers says that decision will hinge on balancing the cost against data-access patterns.

Close this window

Comment  | 
Print  | 
More Insights
The Business of Going Digital
The Business of Going Digital
Digital business isn't about changing code; it's about changing what legacy sales, distribution, customer service, and product groups do in the new digital age. It's about bringing big data analytics, mobile, social, marketing automation, cloud computing, and the app economy together to launch new products and services. We're seeing new titles in this digital revolution, new responsibilities, new business models, and major shifts in technology spending.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A UBM Tech Radio episode on the changing economics of Flash storage used in data tiering -- sponsored by Dell.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.