Toward the end of the decade, companies will not only be able to better monitor the contents of data sent over the network, they'll also be able to determine whether seemingly innocuous bits of information about customers, employees, and partners can be pieced together by criminals to gain access to more sensitive information. Call it an inferential data threat.
"Given the amount of information out there, you need some at least semiautomated way of figuring out what information you can and can't release," says Jessica Staddon, area manager of the Palo Alto Research Center's security and privacy research group.
PARC has created prototype privacy monitoring software designed to understand the inferences in data, the meaning of a name, address, or other piece of data, so it can be removed--or obfuscated, in the case of an electronic document--before it's sent out across the network. For example, if the privacy monitor determines that only one person in a database has a certain combination of attributes--female, born in 1969, lives in the 94061 ZIP code--then it would prohibit those three pieces of data from being accessed together unless the person accessing it had specific permission to do so. This would help protect databases accessed through Web applications from being pilfered via SQL injection attacks, which try to trick Web apps into extracting information the attacker has no right to, data that can be used later to acquire more sensitive information.
Staddon and her team at PARC, a subsidiary of Xerox, envision a network security application that sends the end user--whether it's a blogger, HR manager, or CFO--a warning if data in a file could be used as part of a larger inference. Another option is to integrate this capability into Word, Excel, or whichever tool is used to create the file or e-mail. However, this type of use is much more difficult to develop than examining inferences that can be made against data extracted from a database. Data contained in documents is usually unstructured, and the number of inferences could be much greater as the number of people with access to information grows.
Part of PARC's work on data inference emerged from technology it was planning to develop through a grant from the Defense Advanced Research Projects Agency for its Total Information Awareness project. In 2002 Darpa presented TIA as a way to detect, classify, identify, and track terrorists to prevent attacks. The thinking was that law enforcement could use a combination of biometric, database, natural language processing, evidence extraction, and inferential technology to collect information about transactions made by terrorists before an attack, and thus head off trouble.
Public concern about the misuse of data forced the government to discontinue funding TIA the following year, but PARC's research has continued. "We did deliver some code to Darpa, but we didn't go as far with the project as we would have if the funding had continued," Staddon says. PARC is hoping that it can work with Xerox to bring an automated content inference application to market.
The goal of much of the laboratory work at PARC and elsewhere is to "get to the point where computers are doing a lot more work to check to see what's happening, and any abnormal conditions would be responded to automatically," HP's Redmond says. Computers looking out for computers--now there's an idea with potential.
Illustration by David Moir/Reuters
Inside Microsoft's Labs