Software // Enterprise Applications
News
9/24/2003
04:43 PM
Connect Directly
RSS
E-Mail
50%
50%

The Largest Data Warehouse In The World?

A project of the Windber Research Institute to combine clinical information with volumes of scientific data about genes and proteins will collect as much as 50 terabytes of data every nine months.

Windber Research Institute, an advanced biomedical research facility, is assembling a massive data warehouse combining clinical information with volumes of scientific data about genes and proteins to help understand the cause of--and find a cure for--breast cancer, human reproductive cancers, and cardiovascular disease.

The data warehouse is an ambitious effort to pull clinical and scientific data into a single system, giving researchers an unprecedented opportunity to study the relationship between genes, proteins, and disease. The database will collect as much as 50 terabytes of data every nine months and over time could become the largest data warehouse in the world.

"No one has put all this information onto a single database platform," says Dr. Richard Somiari, chief operating officer and chief scientific officer at the institute, based in Windber, Pa. The system is based on data-warehouse hardware and software from NCR Corp.'s Teradata division. Details about the project are being disclosed this week at Teradata's user conference in Seattle.

Clinical data, from patients at the Windber Medical Center with which the research institute is affiliated, already has been loaded into the data warehouse. That includes data from tissue biopsies (each of which adds 166 Mbytes of data to the system), family histories, radiology (including X-ray images) and histopathology data, and patient DNA, RNA, and protein information.

The next step will be to add data from other research databases, including DNA data from GenBank, protein data from the Swiss-Prot database in Europe, metabolic pathway data from Kyoto University's KEGG (Kyoto encyclopedia of genes and genomes) database, and protein-protein interaction data from the DIP (database of interactive proteins) database at UCLA.

Linking this basic research data with clinical information will allow researchers at Windber to examine multiple variables when investigating the causes of disease, Somiari says. The goals are to develop new strategies for managing patient conditions, discover new "markers" that help doctors diagnose diseases much earlier, and ultimately develop cures for the diseases.

Windber chose the Teradata system because of its scalability and parallel processing capabilities, says Nick Jacobs, the institute's president and CEO. He adds that Windber sought the same kind of technology that Wal-Mart and other commercial companies use to build their own massive data warehouses. The system uses analysis tools from Amersham Biosciences, Genomax Technologies, and Spotfire. Partners in the project include the U.S. Army's Walter Reed Army Medical Center, universities such as the University of Pennsylvania and Creighton University, and research institutes in the U.S., Europe, and Japan.

Comment  | 
Print  | 
More Insights
Building A Mobile Business Mindset
Building A Mobile Business Mindset
Among 688 respondents, 46% have deployed mobile apps, with an additional 24% planning to in the next year. Soon all apps will look like mobile apps and it's past time for those with no plans to get cracking.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.