Can Data Mining Catch Terrorists?

The NSA is said to be analyzing phone records for terrorist information. That would be technologically difficult--but possible.

J. Nicholas Hoover, Senior Editor, InformationWeek Government

May 20, 2006

5 Min Read

One data mining effort within the Defense Department, called Pathfinder, involves analyzing government and private-sector databases, including rapidly comparing and searching multiple large databases for anti-terrorism intelligence. The FBI's Foreign Terrorist Tracking Task Force culls data from the Department of Homeland Security, the FBI, and public data sources to prevent foreign terrorists from entering the country. Other tools for counterterrorism include technology from Autonomy that searches Word documents across various intelligence agency databases; Verity's K2 Enterprise, which mines data from the intelligence community and through Internet searches; and Insight's Smart Discovery, which looks into and categorizes data in unstructured text.

No matter how you slice it, what the government is doing can't be easy. There are many data sources, and the integrity of data is different for each. "The analyst is faced with trying to find something of importance across all these sources potentially containing billions of records," Westphal says.

In the aftermath of the NSA brouhaha, Congress wants to find out if the mining of phone call data can accurately identify patterns and transactions and develop predictive models--without invading the privacy of innocent citizens, says Sen. John Sununu, R-N.H. "The key is bringing in oversight, asking tough questions, making sure the appropriate information is provided," Sununu said in an interview with InformationWeek.

Privacy Matters

On the issue of privacy, the NSA might learn something from the business world. Retailers, for example, mine data on customer interactions and purchase histories to determine promotions or in-store placement, all without invading customer privacy.

Mining For Terrorists

The government for years has analyzed data for patterns and relationships that could point to terrorists Defense Department's Special Operations Command conducted social network analysis on al-Qaida as part of its Able Danger program before the 9/11 attack

As of 2004, there were 14 nonsecret active or planned data mining projects for intelligence gathering and counterterrorism efforts across 52 federal agencies

Intelligence agencies, including the NSA and CIA, have contracts with numerous data mining vendors, including Cognos, IBM, and Teradata

The Defense Advanced Research Project Agency's Terrorism Information Awareness program, a project to mine vast amounts of personal data to identify terrorists, lost congressional funding in 2003 after public outcry over privacy concernsPhoto by Jason Reed/Reuters

The NSA declined to comment for this story. If published reports are correct, however, its database would consist partially or entirely of call records. These records include outgoing and incoming phone numbers, time stamps, and other information, such as whether the call had been forwarded, but not names. USA Today reported that AT&T, BellSouth, and Verizon gave the government access to call data records starting in late 2001. The ambitious goal, according to an unnamed source quoted by the paper, is to put "every call ever made" in the United States into the database. Verizon and BellSouth last week said they weren't involved, though Verizon didn't specify whether MCI, which it acquired last year, ever participated in such activity.

AT&T, which neither confirmed nor denied the report, handles about a third of the calls made in the United States and operates some 49.4 million phone lines. AT&T manages a database called Hawkeye that contained 312 terabytes of uncompressed data as of September, representing 1.88 trillion call records. That comes out to 166 bytes per call record.

Say the number of calls made by AT&T customers averages about 10 per phone line a day. If the NSA has access to five years of AT&T calls, its alleged database would contain about 150 terabytes of call records. Compare that with the largest commercial databases. As of late last year, Wal-Mart stored about 583 terabytes of data in a massively parallel, 1,000-processor NCR Teradata data warehouse, and it was adding a billion records a day.

Heavy-Duty Management

Any database software the NSA might be using would need vast amounts of storage and heavy-duty data management capabilities. Surveys by Winter Corp., a database consulting firm, have found that the largest databases are tripling in size every two years. "It's got to be able to load huge volumes of data rapidly and in a highly parallel way, and to search data in a highly parallel and efficient way," says company president Richard Winter.

A handful of commercial relational databases--from IBM, Oracle, Sybase, and Teradata--might be able to handle a vast volume of phone records, or the NSA could build such a database itself. AT&T, for example, has contracts with Teradata and IBM, but the carrier's big Daytona database was developed internally.

More powerful servers, falling storage prices, and new search and data mining techniques are all working in the NSA's favor. "Ten years ago, you couldn't have accomplished the same thing," Winter says. "It would have been too expensive to put all the information online, and we didn't have the systems capable of searching and mining at high speed."

Still, it's questionable how successful the NSA could be mining data on just some of the calls made within the United States. More than 1,000 wireless carriers, Internet service providers, rural phone companies, voice-over-IP service providers, and long distance companies handle phone calls. For a complete picture, the NSA would need to draw in much of that data, and the more data, the bigger the task. "The history of the intelligence community is information glut," says Mark Pollitt, a former FBI agent and an adjunct professor at Johns Hopkins' School of Professional Studies in Business and Education. "We're good at collecting stuff, but how do you figure out if any of it is any good? This is perhaps the toughest issue with regard to counterterrorism."

with Larry Greenemeier and Elena Malykhina

Photo by Jim Watson/AFP

About the Author(s)

J. Nicholas Hoover

Senior Editor, InformationWeek Government

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights