The First Rule of Data: Do No HarmThe First Rule of Data: Do No Harm
At the Strata Big Data Conference in New York, one of the major themes was the responsibility that data scientists have to do their best to prevent the biases and prejudices that exist in society from creeping into data and the way algorithms are built.
October 18, 2017
When it comes to medicine, the first rule of ethics is, do no harm. When mathematician and data scientist, Cathy O’Neil, spoke during her keynote at Stata Big Data Conference in New York recently, she said the same first rule should apply when building algorithms.
“Algorithms and AI are not objective,” she said. “They’re opinions embedded in code.” O’Neil’s talk, which took the title from her book, Weapons of Math Destruction, was about how even with good intentions, data scientists can create toxic algorithms that end up doing harm instead of good. That can mean failure in trying to solve a problem, like keep good teachers or hire fairly.
When it comes to how we create algorithms, said O’Neil in her keynote, “we haven’t established standards on what is good enough.”
One way O’Neil says we can look to correct this issue is by examining the laws and regulations that govern each industry.
“It’s one thing to build an algorithm that you find useful, like how much to exercise, but if you are building an algorithm that’s used widely for important decisions that are already regulated, then you have to ask the question: What is the industry set standard for this and how do I make sure I'm meeting that standard?”
If this issue is ignored, O’Neil says that we should expect to see the inequality gap widen, as well as continue to see democracy dissolve.
Another keynote presented at Strata by Danah Boyd, founder of the organization, Data & Society, shared how AI can reflect the biases of humans and society in general. For example, consider search results: “Google learned American prejudices and racism and amplified it back at its users,” she says.
And it’s not just unconscious data bias that Boyd warned attendees about. She also talked about intentional manipulation of data, like how a misinformation campaign/conspiracy theory lead to #pizzagate during the 2016 presidential election cycle, or that Russian companies took out ads on Facebook during the election and posted information that was meant to confuse voters, and was largely false.
I asked both Boyd and O’Neil whether they felt that the data science industry was aware of the bias and bad algorithms. Both felt there were companies that were aware of the problem and others that were in the dark. “I think people are aware of [the problem] and sadly, especially in the large powerful companies, they do not feel empowered to fix [bias] because they don’t have power or they are setting themselves up for legal liability,” said O’Neil. “There are always people who simply will not see it even if it’s obvious.”
The industry seems to be largely in a place where they’re working on the awareness, but O’Neil says that she did recently learn that the company Meetup is trying to make sure that their recommendation system isn’t sexist. “I’m not saying they’re doing a perfect job,” says O’Neil, “but they’re making an effort and that’s a big deal.”
Boyd also said she’s seeing companies that are trying to use and gather data ethically.
“At the biggest mainstream consumer companies (e.g., Microsoft, Facebook, Google, etc.), you’re seeing some sophisticated moves in this direction. It’s not uniform across these companies, but these issues are getting traction and I’m excited by the potential,” says Boyd.
The EU is another example of an institution working to combat biased results. In 2012, the EU decided it would apply regulations that would prevent auto insurance companies from discriminating on the basis of gender. The fear was that after these regulations were put in place, women would have to pay more in premiums if prices for men and women came closer to the middle. However, when everything was settled and prices were set based on factors such as driving record and the number of miles driven, the price paid by men and women widened, but largely in favor of womens’ pocketbooks.
“I find this case study fascinating,” says Michael O’Connell, chief analytics officer at TIBCO, “EU insurance companies, by being forced to price premiums to avoid gender bias, they appear to have done a great job by including a myriad of other data attributes, and these data attributes have reduced bias and more accurately aligned risk with premiums.”
O’Connell says that TIBCO played a role in this issue because one of their customers, the Automobile Association of Ireland, used TIBCO products to develop models for the dynamic pricing of car insurance “in a way that equitably aligns with accident risk and avoids biases, such as gender bias.” O’Connell says that as AA of Ireland deployed their software for pricing applications, they also found that the software could be used to identify fraud.
Times are changing, and the analytics industry and data scientists are finding that you can’t just build the algorithm and let it run, you have to monitor results, and make sure you’re constantly revising the algorithm to solve for equality and fairness. Otherwise, we could end up with machine learning and artificially intelligent devices that are solving for bias.
“The tech industry is no longer the passion play of a bunch of geeks trying to do cool [stuff] in the world,” said Boyd during her keynote at Strata. “It’s now the foundation of our democracy, economy, and information landscape. We no longer have the luxury of only thinking about the world we want to build. We must also now start thinking about how others might manipulate our systems, undermine our technologies with an eye on doing harm and causing chaos.”
About the Author(s)
You May Also Like