The Largest Data Warehouse In The World? - InformationWeek
Software // Enterprise Applications
04:43 PM
How Cloud Can Streamline Business Workflow
Jul 11, 2017
In order to optimize your utilization of cloud computing, you need to be able to deliver reliable ...Read More>>

The Largest Data Warehouse In The World?

A project of the Windber Research Institute to combine clinical information with volumes of scientific data about genes and proteins will collect as much as 50 terabytes of data every nine months.

Windber Research Institute, an advanced biomedical research facility, is assembling a massive data warehouse combining clinical information with volumes of scientific data about genes and proteins to help understand the cause of--and find a cure for--breast cancer, human reproductive cancers, and cardiovascular disease.

The data warehouse is an ambitious effort to pull clinical and scientific data into a single system, giving researchers an unprecedented opportunity to study the relationship between genes, proteins, and disease. The database will collect as much as 50 terabytes of data every nine months and over time could become the largest data warehouse in the world.

"No one has put all this information onto a single database platform," says Dr. Richard Somiari, chief operating officer and chief scientific officer at the institute, based in Windber, Pa. The system is based on data-warehouse hardware and software from NCR Corp.'s Teradata division. Details about the project are being disclosed this week at Teradata's user conference in Seattle.

Clinical data, from patients at the Windber Medical Center with which the research institute is affiliated, already has been loaded into the data warehouse. That includes data from tissue biopsies (each of which adds 166 Mbytes of data to the system), family histories, radiology (including X-ray images) and histopathology data, and patient DNA, RNA, and protein information.

The next step will be to add data from other research databases, including DNA data from GenBank, protein data from the Swiss-Prot database in Europe, metabolic pathway data from Kyoto University's KEGG (Kyoto encyclopedia of genes and genomes) database, and protein-protein interaction data from the DIP (database of interactive proteins) database at UCLA.

Linking this basic research data with clinical information will allow researchers at Windber to examine multiple variables when investigating the causes of disease, Somiari says. The goals are to develop new strategies for managing patient conditions, discover new "markers" that help doctors diagnose diseases much earlier, and ultimately develop cures for the diseases.

Windber chose the Teradata system because of its scalability and parallel processing capabilities, says Nick Jacobs, the institute's president and CEO. He adds that Windber sought the same kind of technology that Wal-Mart and other commercial companies use to build their own massive data warehouses. The system uses analysis tools from Amersham Biosciences, Genomax Technologies, and Spotfire. Partners in the project include the U.S. Army's Walter Reed Army Medical Center, universities such as the University of Pennsylvania and Creighton University, and research institutes in the U.S., Europe, and Japan.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
[Interop ITX 2017] State Of DevOps Report
[Interop ITX 2017] State Of DevOps Report
The DevOps movement brings application development and infrastructure operations together to increase efficiency and deploy applications more quickly. But embracing DevOps means making significant cultural, organizational, and technological changes. This research report will examine how and why IT organizations are adopting DevOps methodologies, the effects on their staff and processes, and the tools they are utilizing for the best results.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Strategies to Conquer the Cloud
Chances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll