Your Data Is Biased, Here's Why - InformationWeek
IoT
IoT
Data Management // Big Data Analytics
Commentary
10/11/2017
07:00 AM
Lisa Morgan
Lisa Morgan
Commentary
100%
0%

Your Data Is Biased, Here's Why

Biased data can lead to bad decisions. Most business leaders aren't aware of the problem just yet, but they need to be because they're ultimately responsible.

Bias is everywhere, including in your data. A little skew here and there may be fine if the ramifications are minimal, but bias can negatively affect your company and its customers if left unchecked, so you should make an effort to understand how, where and why it happens. 

"Many [business leaders] trust the technical experts but I would argue that they're ultimately responsible if one of these models has unexpected results or causes harm to people's lives in some way," said Steve Mills, a principal and director of machine intelligence at technology and management consulting firm Booz Allen Hamilton.

Steve Mills
Steve Mills

In the financial industry, for example, biased data may cause results that offend the Equal Credit Opportunity Act (fair lending). That law, enacted in 1974, prohibits credit discrimination based on race, color, religion, national origin, sex, marital status, age or source of income. While lenders will take steps not to include such data in a loan decision, it may be possible to infer race in some cases using a zip code, for example.

"The best example of [bias in data] is the 2008 crash in which the models were trained on a dataset," said Shervin Khodabandeh, a partner and managing director of Boston Computing Group (BCG) Los Angeles, a management consulting company. "Everything looked good, but the datasets changed and the models were not able to pick that up, [so] the model collapsed and the financial system collapsed."  

Shervin Khodabandeh
Shervin Khodabandeh

What Causes Bias in Data

A considerable amount of data has been generated by humans, whether it's the diagnosis of a patient's condition or the facts associated with an automobile accident.  Quite often, individual biases are evident in the data, so when such data is used for machine learning training purposes, the machine intelligence reflects that bias.  A prime example of that was Microsoft's infamous AI bot, Tay, which in less than 24 hours adopted the biases of certain Twitter members. The results were a string of shocking, offensive and racist posts.

"There's a famous case in Broward County, Florida, that showed racial bias," said Mills. "What appears to have happened is there was historically racial bias in sentencing so when you base a model on that data, bias flows into the model. At times, bias can be extremely hard to detect and it may take as much work as building the original model to tease out whether that bias exists or not."

What Needs to Happen

Business leaders need to be aware of bias and the unintended consequences biased data may cause.  In the longer-term view, data-related bias is a governance issue that needs to be addressed with the appropriate checks and balances which include awareness, mitigation and a game plan should matters go awry.

"You need a formal process in place, especially when you're impacting people's lives," said Booz Allen Hamilton's Mills. "If there's no formal process in place, it's a really bad situation. Too many times we've seen these cases where issues are pointed out, and rather than the original people who did the work stepping up and saying, 'I see what you're seeing, let's talk about this,' they get very defensive and defend their approach so I think we need to have a much more open dialog on this."

As a matter of policy, business leaders need to consider which decisions they're comfortable allowing algorithms to make, the safeguards which ensure the algorithms remain accurate over time, and model transparency, meaning that the reasoning behind an automated decision or recommendation can be explained.  That’s not always possible, but still, business leaders should endeavor to understand the reasoning behind decisions and recommendations. 

Kevin Petrasic
Kevin Petrasic

"The tough part is not knowing where the biases are there and not taking the initiative to do adequate testing to find out if something is wrong," said Kevin Petrasic, a partner at law firm White & Case.  "If you have a situation where certain results are being kicked out by a program, it's incumbent on the folks monitoring the programs to do periodic testing to make sure there's appropriate alignment so there's not fair lending issues or other issues that could be problematic because of key datasets or the training or the structure of the program."

Data scientists know how to compensate for bias, but they often have trouble explaining what they did and why they did it, or the output of a model in simple terms. To bridge that gap, BCG's Khodabandeh uses two models: one that's used to make decisions and a simpler model that explains the basics in a way that clients can understand.

BCG also uses two models to identify and mitigate bias.  One is the original model, the other is used to test extreme scenarios.

"We have models with an opposite hypothesis in mind which forces the model to go to extremes," said Khodabandeh. "We also force models to go to extremes. That didn't happen in the 2008 collapse. They did not test extreme scenarios. If they had tested extreme scenarios, there would have been indicators coming in in 2007 and 2008 that would allow the model to realize it needs to adjust itself."

A smart assumption is that bias is present in data, regardless.  What the bias is, where it stems from, what can be done about it and what the potential outcomes of it may be are all things to ponder.

Conclusion

All organizations have biased data.  The questions are whether the bias can be identified, what effect that bias may have, and what the organization is going to do about it.

To minimize the negative effects of bias, business leaders should make a point of understanding the various types and how they can impact data, analysis and decisions. They should also ensure there's a formal process in place for identifying and dealing with bias, which is likely best executed as a formal part of data governance.

Finally, the risks associated with data bias vary greatly, depending on the circumstances. While it's prudent to ponder all the positive things machine learning and AI can do for an organization, business leaders are wise to understand the weaknesses also, one of which is data bias.

Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Purtier
50%
50%
Purtier,
User Rank: Apprentice
10/28/2017 | 4:14:31 AM
Data Manipulation
Data are supposed to be manipulated, and smart people win. It is just as simple as that. 
LisaMorgan
50%
50%
LisaMorgan,
User Rank: Moderator
10/25/2017 | 1:02:18 PM
Re: Effective Leadership & Disclosure to Reduce Biased Data
Excellent observations, @mjohnson681.  Apparently we used to work for the same company with the same middle manager.  :)

It's clear that SO much data is so skewed.  Even well-intentioned people tend to cherry-pick information because it dovetails with their POV better or their original hypothesis was off-base but there's so much invested in it already, better see it through.

I've written quite a bit about bias because I think it's such a huge issue and one that doesn't get enough attention.
mjohnson681
100%
0%
mjohnson681,
User Rank: Apprentice
10/14/2017 | 2:57:31 PM
Effective Leadership & Disclosure to Reduce Biased Data
Great leaders of the 21st centry need to give up the chain-of-command mentality if they are going to be successful.  So many times in business, key information is filtered out by middle management.  The filters or spins can be to tailor the information for the audience's preferences.  Other filters or spins can be placed on data to hide reality, preserve power structures, political palay, or other unethical omissions or biases in the data.  When it comes down to it, the truth are in the trenches, where the people are working on the front line.  With today's technology, there should be no excuse for being able to provide executives or those charged with governance (e.g. Boards of Directors) effective, unbiased information along with full disclosure of assumptions, etimates, etc.  Another interesting topic is the level of bias in the government's key economic numbers (e.g. unemployment, inflaction, GDP, etc.).  Unlike publically traded companies, these powerful economic indicators that drive the broader stock markets do not require full disclosure of significant estimates or to be audited by an independent party.  If one wanted to manipulate equity markets or management decisions, biased data is the perfect vehicle.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends for 2018
As we enter a new year of technology planning, find out about the hot technologies organizations are using to advance their businesses and where the experts say IT is heading.
Video
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll