Mitigating Bias in AI Using Debias-GAN

This report was originally published in September 2019.

Abstract

Today's AI and Machine Learning (ML) algorithms have achieved spectacular results in automating decisions that were traditionally made by humans. However, the actual data used for model training may be imbalanced and may introduce discriminatory biases towards specific groups of people. Natural Language Processing (NLP) machine learning models are gaining popularity in various contexts such as resume screening, college admission, emotion assessment, repeated crime prediction, and more. Consequently, it becomes increasingly important to recognize the role they play in contributing to societal biases and stereotypes. NLP models trained on historical data often lack optimization for reducing implicit biases, and in some cases, they further perpetuate biases. Bias in machine learning models presents itself as a strong association amongst attributes that ought not be correlated. In this white paper, we propose a general framework, debias-GAN, to address this issue by explicitly augmenting a training dataset for NLP models with underrepresented instances synthesized by a pretrained sequence generating model. As a proof-of-concept, we chose to experiment with a deep classification model that mimics decorrelation between user ethnicity and tweets. The synthetic data is generated by a targeted language model (LM) that generates realistic but user-ethnicity-oblivious tweets. We trained such debiased LMs with generative adversarial networks (GAN) through reinforcement learning (RL) by adding a penalty function term to the loss function, to minimize sequences with strong indication of user ethnicity via a policy update. The reward is provided by an independently trained classifier that identifies user ethnicity from tweets. We experimented with the ratio of mixed datasets and tested the debiasing impact using three fairness metrics. The debias-GAN is able to improve the fairness metrics of the classifier by up to seven times while maintaining classification performance.

Abstract

Thanks for reading. Want to continue?