Bigger & Better

Just a few years ago, a data warehouse or transactional database that approached a terabyte was considered big. Today, "big" means tens of terabytes. Here's the story behind four of the largest data systems in the world, plus a government database project expected to reach up to 5 petabytes (or 5,000 terabytes) within several years and up to 50 petabytes in 20 years. All are examples of organizations pushing the edge of what's possible with database technologies.
Defense Department
The Department of Defense is known for stretching the boundaries of technology. It's doing that again with a planned database system to hold medical records for 9 million military personnel that could eventually reach a capacity of 50 petabytes of data. To imagine how much data that is, consider what it would amount to in the form of double-spaced text on both sides of a piece of paper: "The stack would reach the moon and back to Earth--twice," estimates Richard Winter, president at database research firm Winter Corp.

The government's Composite Health Care System II, estimated to cost $3.8 billion over a 20-year period, is king-sized among database migration projects. It includes a data repository and data warehouse that, when completed in mid-2006, will handle text-based clinical records and digital images, such as X-rays, for all active U.S. military personnel and their families. It will start by managing about 5 petabytes of data--or 5,000 terabytes--and will grow each year to reach a projected total of 30 to 50 petabytes, says Larry Albert, senior VP of health-care practice at Integic, the prime contractor on the project.

Dr. Robert Wah -- Photo by David Deal

"We're moving ... to patient-centric systems," Wah says.

Photo of Dr. Robert Wah by David Deal
"We're moving from hospital-centric to patient-centric systems," says Dr. Robert Wah of the U.S. Navy Medical Corps and a director of information management at the Defense Department. "As a physician, I can have the power of the computer--access to information electronically--to take care of my patients."

Winter says the projected growth of the upcoming Defense Department system is 100 times bigger than the top health-care data warehouse tracked by his firm. It's estimated that at least 60% of the volume of data managed will come from digital radiography images, Integic's Albert says.

The Defense Department already is a technology leader in how it handles health-care records. For 10 years, hundreds of military hospitals and facilities worldwide have kept electronic clinical records, while many private-sector health-care organizations still depend on paper.

But the existing system is a sprawling network, including clinical data from more than 60 systems stored within hierarchical database systems based on the Mumps language at each hospital. That makes it difficult to access and share clinical information when military personnel relocate--as they constantly do.

The new project entails moving all clinical data to relational databases in two places: a data repository in Montgomery, Ala., that will be used by doctors and other clinicians as they treat patients, and a data warehouse in San Antonio, Texas, that will house aggregate data for analysis and reports by government researchers and others. The repository and the data warehouse will contain identical clinical data, but each will be optimized for its particular functions.

For the clinical repository, doctors will use established templates when making diagnoses. When examining a patient, a doctor can use a PC or another device to select from a list of symptoms a patient has. That information will then be stored as "structured vocabulary text that is computable information, not just a blob of text," Albert says, so it can be interpreted easily by any doctor accessing a patient's records. The information, excluding a patient's identity, also can be used in aggregate by the military or the Centers for Disease Control and Prevention to monitor patterns of symptoms that might indicate an epidemic or even signs of bioterrorism.

The repository and warehouse will run on Oracle software and Hewlett-Packard Superdome servers. Digital radiology images likely will be stored locally at the hospitals, Albert says. The central clinical repository will have pointers to where digital images are stored, providing access almost instantly or over several hours, depending on issues such as data compression.

The biggest challenge will be monitoring performance, Albert says. As medical radiography technology evolves to produce higher-resolution digital images, the system has to be able to do an equally good job of compressing and decompressing that data so an image can be viewed in the same high resolution in which it was produced. The best measure of the project's success, Albert says, will be the level of satisfaction among the doctors who use it.

-- Marianne Holbasuk McGee

Editor's Choice
John Edwards, Technology Journalist & Author
Carrie Pallardy, Contributing Reporter
Alan Brill, Senior Managing Director, Cyber Risk, Kroll
John Bennett, Global Head of Government Affairs, Cyber Risk, Kroll
Sponsored by Lookout, Sundaram Lakshmanan, Chief Technology Officer
Brandon Taylor, Digital Editorial Program Manager
Jessica Davis, Senior Editor
Richard Pallardy, Freelance Writer
Sponsored by Lookout, Sundaram Lakshmanan, Chief Technology Officer
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing