Debiasing Our Statistical Algorithms Down to Their Roots
Eliminating bias in the data and algorithms that drive artificial intelligence and machine learning initiatives requires constant vigilance on the part of not only data scientists but up and down the corporate ranks.
Rest assured that AI, machine learning (ML), and other statistical algorithms are not inherently biased against the human race. Whatever their limitations, most have a bias in favor of action. Algorithms tend to make decisions faster than human decision makers, do it more efficiently and consistently, consider far more relevant data and variables, and leave a more comprehensive audit trail. For their part, human decision makers often have their own biases, such as giving undue weight to their personal experiences, basing decisions on comfortable but unwarranted assumptions, and only considering data that supports preconceived beliefs.
Nevertheless, let’s not underestimate the extent to which data-driven algorithms can embed serious biases that operate to the disadvantage of demographic groups with “protected attributes,” such as race, gender, age, religion, ethnicity, and national origin. Fortunately, most data scientists have been trained to prevent, detect, and reduce biases at every step in the ML lifecycle.
Even the best data scientists may embed biases in their handiwork without realizing it. To address this issue comprehensively, they should follow these guidelines:
Don’t overweight the importance of bias reduction: Recognize that bias reduction is only one of the outcomes that algorithm-driven decision-making processes are designed to promote. Algorithmic processes may also need to ensure that other outcomes — such as boosting speed, efficiency, throughput, profitability, quality, and the like — are achieved as well, without unduly compromising fairness. Trade-offs among conflicting outcomes must always be addressed openly. For example, boosting predictive-policing algorithms’ effectiveness may call for models that recommend law-enforcement practices that some demographic groups may consider unwarranted profiling. Likewise, increasing the effectiveness of multichannel marketing may require algorithmic approaches that encroach on privacy.
Apply statistical bias-reduction best practices: Identify and correct biases in statistical estimators that may be due to underrepresentation of individuals with protected attributes within the data set used to build and train algorithms. For example, some ill-trained AI-powered tools have auto-tagged African-Americans as apes in some images, have shown difficulty in recognizing people with dark skin tone, and have falsely assessed Asians’ face shots as being of people with their eyes closed. For further details on how data scientists weed out biases throughout ML development, please check out this post of mine from a few years ago.
Stay vigilant for buried bias: Flag the extent to which the data being gathered, though apparently neutral to protected attributes, is in fact strongly correlated with them and may implicitly serve as a proxy for them that is inadvertently baked into correspondingly biased decisioning algorithms. For example, it may seemingly be OK to ask people for their zip code, income, education, occupation, and the like, but those variables can often be tied back to protected attributes such as race, ethnicity, national origin, and the like.
Recognize when bias is hard-coded into the society: Assess the extent to which current institutional arrangements make it difficult to gather the representative “ground truth” data you’ll need to correct statistical biases. This may be due to the fact that the protected attributes are associated with disadvantaged, underprivileged, and otherwise disenfranchised demographic groups that historically been unable or unwilling to engage with the businesses, government agencies, and other organizations that collect this data. This is one of the “Catch-22s” of bias elimination: You can’t easily eliminate biases and thereby help these groups to better their socioeconomic status if you can’t gather enough representative data to debias the relevant decisioning algorithms. Similarly, the underrepresentation of some demographic groups in the data tends to produce correspondingly wider statistical confidence intervals for them in the data. This tends to make predictive engagement disadvantaged groups appear riskier, even if it objectively is not, further contributing to their disenfranchisement.
Avoid building algorithms that inadvertently learn society’s biases: Natural language processing (NLP) algorithms are central to many AI initiatives. Data scientists should institute ongoing “word embedding association tests” to find biases against protected attributes that are implicit in the word-representation vectors behind many NLP algorithms. This refers to the phenomenon under which implicit linguistic associations -- such as how people often associate “doctor” with “man” and “nurse” with “woman --embed themselves in NLP models through their statistical affinities within the underlying textual training data set. Researchers have built tools that can debias this language by adjusting those words’ relative statistical distances within the relevant NLP algorithm.
Conduct regular bias audits of your algorithms: Data scientists should make debiasing a standard risk-mitigation checkpoint in the ML DevOps pipeline. For example, the open-source bias-auditing toolkit Aequitas assesses ML biases that may discriminate against protected attributes such as race, gender, ethnicity, age, and so on. Developed by the University of Chicago’s Center for Data Science and Public Policy, Aequitas can audit algorithmic systems to look for biased actions or outcomes that are based on false or skewed assumptions about various demographic groups. Using Python or a command line interface, users simply upload data from the system being audited, configure bias metrics for protected attribute groups of interest as well as reference groups, and then the tool generated bias reports. In Aequitas, bias assessments can be made prior to a model being operationalized, evaluating its performance based on whatever training data was used to tune it for its task. The audits can be performed post-production, based on operational data of how biased the model proved to be in live environments. Or they can involve a bit of both, auditing bias in an A/B testing environment in which limited trials of revised algorithms are evaluated vis-à-vis whatever biases were observed in those same systems in prior production deployments.
Debiasing should be a core practice that permeates business culture generally, not just the data science team. As McKinsey notes in this study, biases often have deeper roots than any specific algorithm or even any particular business process. They often stem from psychological, sociological, behavioral, and even physiological factors that predispose people, including data scientists, to take particular courses of action even when those don’t conform with some standard notion of fairness. Among other remedies, McKinsey recommends that organizations undertake regular “decision-conduct surveys” to identify bias “markers” that may subconsciously sway individual and even whole organizations to unfairly disadvantage people with various protected attributes.
Bias isn’t an issue that can be “fixed” once and for all. But, where decision-making algorithms come into the picture, organizations must always make biases as transparent as possible and attempt to eliminate any that perpetuate unfair societal outcomes.
Jim is Wikibon's Lead Analyst for Data Science, Deep Learning, and Application Development. Previously, Jim was IBM's data science evangelist. He managed IBM's thought leadership, social and influencer marketing programs targeted at developers of big data analytics, machine ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.