Can Data Mining Catch Terrorists? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

07:45 PM
Connect Directly

Can Data Mining Catch Terrorists?

The NSA is said to be analyzing phone records for terrorist information. That would be technologically difficult--but possible.

One data mining effort within the Defense Department, called Pathfinder, involves analyzing government and private-sector databases, including rapidly comparing and searching multiple large databases for anti-terrorism intelligence. The FBI's Foreign Terrorist Tracking Task Force culls data from the Department of Homeland Security, the FBI, and public data sources to prevent foreign terrorists from entering the country. Other tools for counterterrorism include technology from Autonomy that searches Word documents across various intelligence agency databases; Verity's K2 Enterprise, which mines data from the intelligence community and through Internet searches; and Insight's Smart Discovery, which looks into and categorizes data in unstructured text.

No matter how you slice it, what the government is doing can't be easy. There are many data sources, and the integrity of data is different for each. "The analyst is faced with trying to find something of importance across all these sources potentially containing billions of records," Westphal says.

In the aftermath of the NSA brouhaha, Congress wants to find out if the mining of phone call data can accurately identify patterns and transactions and develop predictive models--without invading the privacy of innocent citizens, says Sen. John Sununu, R-N.H. "The key is bringing in oversight, asking tough questions, making sure the appropriate information is provided," Sununu said in an interview with InformationWeek.

Privacy Matters

On the issue of privacy, the NSA might learn something from the business world. Retailers, for example, mine data on customer interactions and purchase histories to determine promotions or in-store placement, all without invading customer privacy.

Mining For Terrorists
The government for years has analyzed data for patterns and relationships that could point to terrorists Defense Department's Special Operations Command conducted social network analysis on al-Qaida as part of its Able Danger program before the 9/11 attack
As of 2004, there were 14 nonsecret active or planned data mining projects for intelligence gathering and counterterrorism efforts across 52 federal agencies
Intelligence agencies, including the NSA and CIA, have contracts with numerous data mining vendors, including Cognos, IBM, and Teradata
The Defense Advanced Research Project Agency's Terrorism Information Awareness program, a project to mine vast amounts of personal data to identify terrorists, lost congressional funding in 2003 after public outcry over privacy concerns

Photo by Jason Reed/Reuters
The NSA declined to comment for this story. If published reports are correct, however, its database would consist partially or entirely of call records. These records include outgoing and incoming phone numbers, time stamps, and other information, such as whether the call had been forwarded, but not names. USA Today reported that AT&T, BellSouth, and Verizon gave the government access to call data records starting in late 2001. The ambitious goal, according to an unnamed source quoted by the paper, is to put "every call ever made" in the United States into the database. Verizon and BellSouth last week said they weren't involved, though Verizon didn't specify whether MCI, which it acquired last year, ever participated in such activity.

AT&T, which neither confirmed nor denied the report, handles about a third of the calls made in the United States and operates some 49.4 million phone lines. AT&T manages a database called Hawkeye that contained 312 terabytes of uncompressed data as of September, representing 1.88 trillion call records. That comes out to 166 bytes per call record.

Say the number of calls made by AT&T customers averages about 10 per phone line a day. If the NSA has access to five years of AT&T calls, its alleged database would contain about 150 terabytes of call records. Compare that with the largest commercial databases. As of late last year, Wal-Mart stored about 583 terabytes of data in a massively parallel, 1,000-processor NCR Teradata data warehouse, and it was adding a billion records a day.

Heavy-Duty Management

Any database software the NSA might be using would need vast amounts of storage and heavy-duty data management capabilities. Surveys by Winter Corp., a database consulting firm, have found that the largest databases are tripling in size every two years. "It's got to be able to load huge volumes of data rapidly and in a highly parallel way, and to search data in a highly parallel and efficient way," says company president Richard Winter.

A handful of commercial relational databases--from IBM, Oracle, Sybase, and Teradata--might be able to handle a vast volume of phone records, or the NSA could build such a database itself. AT&T, for example, has contracts with Teradata and IBM, but the carrier's big Daytona database was developed internally.

More powerful servers, falling storage prices, and new search and data mining techniques are all working in the NSA's favor. "Ten years ago, you couldn't have accomplished the same thing," Winter says. "It would have been too expensive to put all the information online, and we didn't have the systems capable of searching and mining at high speed."

Still, it's questionable how successful the NSA could be mining data on just some of the calls made within the United States. More than 1,000 wireless carriers, Internet service providers, rural phone companies, voice-over-IP service providers, and long distance companies handle phone calls. For a complete picture, the NSA would need to draw in much of that data, and the more data, the bigger the task. "The history of the intelligence community is information glut," says Mark Pollitt, a former FBI agent and an adjunct professor at Johns Hopkins' School of Professional Studies in Business and Education. "We're good at collecting stuff, but how do you figure out if any of it is any good? This is perhaps the toughest issue with regard to counterterrorism."

with Larry Greenemeier and Elena Malykhina

Photo by Jim Watson/AFP

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
2 of 2
Comment  | 
Print  | 
More Insights
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Learning: It's a Give and Take Thing
James M. Connolly, Editorial Director, InformationWeek and Network Computing,  1/24/2020
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
Register for InformationWeek Newsletters
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
White Papers
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Sponsored Video
Flash Poll