Harvard Goes To School On Big DataHarvard Goes To School On Big Data
IBM Netezza appliance powers medical school's analysis of 10-million-plus patient records for drug safety research.
July 5, 2011
"If a patient has a high lipid test [LDL] level, for example, it's more likely they will take a lipid-lowering medication, and at the same time it's more likely that they will have a heart attack," Schneeweiss explains.
LDL levels are just one risk factor out of hundreds that are identified and prioritized by high-dimensional propensity scores. It takes time to develop and run the algorithms, and that gets back to the capacity and speed of the analytics platform. Without elaborate and time-consuming database tuning and optimization work, researchers found that many of their iterative algorithms took as long as overnight or a weekend to run. "By 2009 we recognized that we needed a fundamentally different approach," says Schneeweiss. The different approach embraced by the commercial world for big-data processing has been massively parallel processing appliances built on commodity (mostly Intel X86) servers rather than clusters of expensive proprietary symmetric multiprocessor servers. Harvard didn't have to look far to find such an appliance as it was approached by IBM Netezza, headquartered in nearby Marlborough, Mass., in 2010 to explore the possibility of a research partnership. (Competitors will undoubtedly point out that Netezza still uses proprietary Field Programmable Gate Arrays for data filtering, but the company switched to commodity X86 processors and storage in 2009 with the move to its TwinFin architechture .) Appliances are typically a seven-figure investment, but through the partnership, Harvard did not have to pay for its appliance. "That explains why we didn't shop around -- it was a Godsend that came at the right moment," says Schneeweiss. The transition to IBM Netezza happened quickly early this year, as IBM Netezza had a TwinFin appliance up and running at a Harvard research data center within two days. Once data was migrated to the new environment, Schneeweiss says the school's six programmers were able to do analyses at least ten times faster without any optimization. "We have one analysis of data on 150,000 patients that took 20 minutes, with optimization, in the old environment, and it now takes two seconds without any special tuning," he says. Given the faster analysis speeds and minimal tuning now required, researches now routinely apply high-dimensional propensity scoring techniques to improve the accuracy of their research. "That gets us that much closer to causal conclusions, and researchers can act upon that insight," Schneeweiss says. The faster Harvard's researchers can develop conclusive research, the sooner they will be able help drug companies, the FDA, and other regulatory agencies take risky drugs off the market and steer practitioners toward the safest and most effective medications available. For IBM Netezza, promoting the use of the company's technology among prestigious researchers helps opens doors at other research facilities and at commercial firms, such as pharmaceutical giants. "We at Netezza are excited that our collaboration with these notable Harvard Medical School faculty and researchers has already led to leveraging IBM research development efforts and existing products toward revolutionizing computational pharmacoepidemiology," wrote Shawn Dolley, vice president and general manager of the Healthcare & Life Sciences practice at IBM Netezza. It's the kind of good-will gesture that has always paid off for IBM, even if means giving away a million-dollar-plus appliance.
About the Author(s)
You May Also Like