Europe's data protection rules have established a "right to be forgotten," to the consternation of technology companies like Google that have built businesses on computational memory. The rules also outline a "right to explanation," by which people can seek clarification about algorithmic decision that affect them.
In a paper published last month, Bryce Goodman, Clarendon Scholar at the Oxford Internet Institute, and Seth Flaxman, a post-doctoral researcher in Oxford's Department of Statistics, describe the challenges these rights pose to businesses and the opportunities they present to machine learning researchers in designing algorithms that are open to evaluation and scrutiny.
The rationale for requiring companies to explain their algorithms is to avoid unlawful discrimination. In his 2015 book The Black Box Society, University of Maryland law professor Frank Pasquale describes the problem with opaque programming.
"Credit raters, search engines, major banks, and the TSA take in data about us and convert it into scores, rankings, risk calculations, and watch lists with vitally important consequences," Pasquale wrote. "But the proprietary algorithms by which they do so are immune from scrutiny."
Several academic studies have already explored the potential for algorithmic discrimination.
A 2015 study by researchers at Carnegie Mellon University, for example, found that Google showed ads for high income jobs to men more frequently than to women.
That's not to say Google did so intentionally. But as other researchers have suggested, algorithmic discrimination can be an unintended consequence of reliance on inaccurate or biased data.
Google did not immediately respond to a request to discuss whether it changed its advertising algorithm in response to the research findings.
A 2014 paper from the Data & Society Research Institute echoes the finding that inappropriate algorithmic bias tends to be inadvertent. It states:
Although most companies do not intentionally engage in discriminatory hiring practices (particularly on the basis of protected classes), their reliance on automated systems, algorithms, and existing networks systematically benefits some at the expense of others, often without employers even recognizing the biases of such mechanisms.
Between Europe's General Data Protection Rules (GDPR), scheduled to take effect in 2018, and existing regulations, companies would do well to pay more attention to the way they implement algorithms and machine learning.
But adhering to the rules won't necessarily be easy, according to Goodman and Flaxman. They note that excluding sensitive data having to do with race or religion, for example, doesn't necessarily mean algorithms will return non-biased results. That's because other non-sensitive data points, like geographic area of residence, may have some correlation with sensitive data.
What's more, the researchers observe that many large data sets are the product of multiple smaller data sets. The derivation makes it difficult if not impossible for organizations to vouch for the integrity, accuracy, and neutrality in their data.
"The GDPR thus presents us with a dilemma with two horns: Under one interpretation the non-discrimination requirement is ineffective, under the other it is infeasible," write Goodman and Flaxman.
In a phone interview, Lokke Moerel, senior of counsel at Morrison & Foerster, said the provision on automated decision making is not new.
Also under the current Data Protection Directive (distinct from the GDPR and criminal-oriented Directive to be implemented by May 6, 2018), companies have to inform individuals about the underlying logic involved in their automated decisions.
Moerel acknowledged the difficulties of the rules, noting that in an era where algorithms are dynamic and self-learning, it's very difficult to know how an algorithm made a decision at any point in time, let alone communicate this to an individual in a meaningful manner. If logic is incomprehensible to the vast majority of people, the question becomes: What is the added value of providing this information in the first place?
Moerel said she found it troubling that algorithms can end up being discriminatory through data correlation. As an example, she noted that an insurance company charging higher premiums in a certain region because of higher accident rates could end up discriminating against a specific ethnic group that happens to live in that area.
She also suggested there's a risk that companies may try to hide such discriminatory correlations by performing further analytics and finding other nonsensitive correlations that they know are correlated with the sensitive data. Requiring the disclosure of algorithmic logic guards against such action, she said.
In order to avoid being questioned about algorithmic logic, Moerel suggested companies give individuals affected by their decisions more control over the implications of how data is used (e.g., by giving them control over their ad preferences, whereby they can view and adjust the indicators that triggered the relevant advertisement for the visitor).
"It will help to avoid individuals questioning your logic if you give them control of the triggers that matter to them," she said. "If people are looking at a black box, it won't be acceptable for European regulators."
Goodman and Flaxman say that work is already underway to make algorithms more easily subject to inspection. They remain optimistic that technical code can coexist with the legal code.
"We believe that, properly applied, algorithms can not only make more accurate predictions, but offer increased transparency and fairness over their human counterparts," they conclude.
(Cover Image: mattjeacock/iStockphoto)
[Editor's note: The text reference and link to the current Data Protection Directive were corrected.]