Database Project To Track Health Of 100,000 U.S. Children

The NIH-backed study will examine the effects of genes and environmental factors on the health of volunteer participants in 105 locations from before birth to age 21.
The bulk of the study data will be collected during "study visits" with the children that will be more frequent with infants and babies and likely to become less frequent as children grow older, Keim said. Those visits will take place at about 105 locations across the United States, including participating university medical centers, as well as smaller study offices set up in more convenient locations for families.

"One of those visit centers is next to the only post office in town," Keim said. On other occasions, data from air quality and other environmental samples will be collected during visits to the children's homes. Keim estimates that there are close to 1,000 researchers and federal staff working on the study in teams across about 40 research centers, including medical schools, hospitals, public health departments, and nonprofit organizations.

The study visits "are not substitutes for usual medical care" children need from their pediatrician or other health care providers, Keim said.

As more doctors nationwide deploy electronic medical record systems in their offices, data from patient records will be used in the study as well. In the meantime, doctors, parents, and patients can update researchers on health issues during study visits, as well as information collected in other ways. For instance, pregnant women will be asked to track care and health in a log, Keim said.

Researchers will collect data using tablet PCs running Windows XP. Keim said it's expected that 5,000 tablet PCs will be used for the full study, or about 50 tablets per study center. Study staff will securely transmit tablet data to the central coordinating center via VPN.

Among the software being used for the study are applications developed by the Centers for Disease Control and Prevention and other federal agencies. That includes CDC survey-authoring software that allows researchers to quickly and flexibly create questionnaires. "That sounds simple, but it's actually very hard to do," Keim said.

Data will be stored in Oracle and SQL databases. The size of the study's databases will be "relatively small in the beginning but will ramp up over the study," she said. "We will add storage as needed. It is tiered based on access time. We do not have the final estimate, but the size of the central database will certainly be in the multiple-terabyte range," she said.

"Much of the data analysis will be via commercial applications like SAS or SPSS," said Keim. "There will also be software tools to analyze genetic data, likely to be adapted from previous genetics studies and originally developed by consortia like through the Human Genome Project. Also, mapping tools will be important for data analysis, like ArcGIS," she said. ArcGIS is a product from GIS software vendor ESRI.

The team of researchers involved with the study includes obstetricians, pediatricians, social scientists, neurologists, and psychologists. Preliminary results from the study's early years are expected to start in 2011. The study will tackle subjects ranging from how a child's genes and environment interact to promote violent behavior in teenagers, to whether exposure to allergens early in life can actually help prevent asthma from developing in children.