Controversial Data-Mining Project Lives On

The government continues to finance research to create tools that could mine records for information about terrorists--even though Congress has eliminated a Pentagon office that was developing the technology.

InformationWeek Staff, Contributor

February 23, 2004

6 Min Read

WASHINGTON (AP) -- The government is still financing research to create powerful tools that could mine millions of public and private records for information about terrorists despite an uproar last year over fears it might ensnare innocent Americans.

Congress eliminated a Pentagon office developing the terrorist tracking technology because of the outcry over privacy implications. But some of those projects from retired Adm. John Poindexter's Total Information Awareness effort were transferred to U.S. intelligence offices, congressional, federal, and research officials told The Associated Press.

In addition, Congress left undisturbed a separate but similar $64 million research program run by a little-known office called the Advanced Research and Development Activity (ARDA) that has used some of the same researchers as Poindexter's program.

"The whole congressional action looks like a shell game," said Steve Aftergood of the Federation of American Scientists, which tracks work by U.S. intelligence agencies. "There may be enough of a difference for them to claim TIA was terminated while for all practical purposes the identical work is continuing."

Poindexter's goal was to predict terrorist attacks by looking for telltale patterns of activity in passport applications, visas, work permits, driver's licenses, car rentals, airline ticket purchases, and arrests, as well as credit transactions and education, medical, and housing records.

But the research created a political uproar because such reviews of millions of transactions could put innocent Americans under suspicion. One of Poindexter's own researchers, David D. Jensen at the University of Massachusetts, has acknowledged that "high numbers of false positives can result."

Disturbed by the privacy implications, Congress last fall closed Poindexter's office, part of the Defense Advanced Research Projects Agency, and barred the agency from continuing nearly all his research. Poindexter quit government, claiming his work was misunderstood.

But the work didn't die.

In killing Poindexter's office, Congress agreed to continue paying to develop highly specialized software to gather foreign intelligence on terrorists.

In a classified section summarized publicly, Congress gave money to the "National Foreign Intelligence Program," without openly identifying which intelligence agency would do the work. It said the product of the research could only be used overseas or against non-U.S. citizens in this country, not against Americans on U.S. soil.

Congressional officials declined to say which Poindexter programs were killed and which were transferred, but people with direct knowledge of contracts told AP that the surviving programs included some of 18 data-mining projects known as Evidence Extraction and Link Discovery in Poindexter's research.

Poindexter's office described that research as "technology not only for 'connecting the dots' that enable the U.S. to predict and pre-empt attacks, but also for deciding which dots to connect." It was among the government's most controversial research programs.

Ted Senator, who managed that research for Poindexter, told government contractors that mining data to identify terrorists "is much harder than simply finding needles in a haystack."

"Our task is akin to finding dangerous groups of needles hidden in stacks of needle pieces," he said. "We must track all the needle pieces all of the time."

Among Senator's 18 projects, Jensen's work shows how flexible such powerful software can be. Jensen used two online databases, the Internet Movie Database and the Physics Preprint Archive, to develop tools that would predict whether a movie would gross more than $2 million its opening weekend and would identify authoritative physics authors.

Jensen said in an interview Poindexter's staff liked his research because the data involved "people, and organizations, and events, ... like the data in counterterrorism."

At the University of Southern California, professor Craig Knoblauch said he developed software that automatically extracted information from travel Web sites and telephone books and tracked changes over time.

Privacy advocates feared that if such powerful tools were developed without limits from Congress, government agents could use them on any database.

Sen. Ron Wyden, D-Ore., who fought to restrict Poindexter's office, is trying to force the executive branch to tell Congress about all its data-mining projects. He recently pleaded with a Pentagon advisory panel to propose rules on reviewing data that Congress could turn into laws.

ARDA sponsors corporate and university research on information technology for U.S. intelligence agencies. It is developing computer software that can extract information from databases as well as text, voices, other audio, video, graphs, images, maps, equations, and chemical formulas. It calls its effort "Novel Intelligence from Massive Data."

ARDA said it has not given researchers government or private data and obeys privacy laws.

The project is part of its effort "to help the nation avoid strategic surprise, ... events critical to national security, ... such as those of September 11, 2001," the office said.

Poindexter had envisioned software that could quickly analyze "multiple petabytes" of data. One petabyte would fill the Library of Congress' space for 18 million books more than 50 times. It could hold 40 pages of text for each of the more than 6.2 billion humans on Earth.

ARDA said its software would have to deal with "typically a petabyte or more" of data. It noted that some intelligence data sources "grow at the rate of four petabytes per month." Experts said those are probably files with satellite surveillance images and electronic eavesdropping results.

The Poindexter and ARDA projects are vastly more powerful than other data-mining projects, like the Department of Homeland Security's CAPPS II program to classify air travelers or the six-state, Matrix anti-crime system funded by the Justice Department. They use commercial data-mining technology that Poindexter's office said would "take decades" to build "the new databases we need to combat terrorism."

In September 2002, ARDA awarded $64 million in contracts over three and a half years. The contracts went to more than a dozen companies and university researchers, including at least six who also had worked on Poindexter's program.

Congress threw these researchers into turmoil. Doug Lenat, the president of Cycorp Corp. in Austin, Texas, won't discuss his work but said he had an "enormous seven-figure deficit in our budget" because Congress shut down Poindexter's office.

Like many critics, James Dempsey of the Center for Democracy and Technology sees a role for properly regulated data-mining in evaluating the vast, under-analyzed data the government already collects.

But expansions of data mining increase "the risk of an innocent person being in the wrong place at the wrong time, of having rented the wrong apartment ... or having a name similar to the name of some bad guy," he said.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights