Mitigating Bias in AI Language Models

Something to manage, not to avoid

Aug 30, 2023

Hello, loyal readers of the Ethical Technologist. Hope you all had lovely weeks last week - I was on vacation at the beach with family in South Carolina catching some rest and relaxation.

However, even as I took a break from my newsletter, my job at SmartNews, and any sort of responsibility, I was thinking back on part of my conversation from two weeks ago with my friend and former colleague, Zane Homsi, on the Ethical Technologist podcast. In our conversation, Zane mentioned one way to mitigate potential downstream ethical harms when building products with generative AI: to play around with various prompts to understand the underlying tendencies and biases of the models themselves. I responded that it would also fall on technologists to read up on the research regarding the underlying bias of these large language models (LLMs) to identify other ways to minimize societal harm.

Taking a step back for a moment, it is worth noting that algorithms are prediction machines. They take data from past events, group that data into categories, and then label new information according to those categories. The degree to which a given algorithm gives us the output we want is dependent on the nature of the data on which it has been trained and the categories it develops, either on its own or in a supervised context.

In short, to borrow an expression from the field of data science, “garbage in, garbage out.” If the data is inaccurate, we cannot trust the algorithmic output. If the categories are not what we want, we cannot trust the algorithmic output either.

Algorithms and the data on which they are trained are created by human beings, which means they inherently contain our biases. Despite our best intentions, we have created hiring algorithms that favor white male applicants and crime-fighting algorithms that discriminate against people of color. At times, these biases have arisen because the data we included in the algorithms was skewed; other times it was because the algorithms assigned categories that injected bias into the outputs. Even though eliminating all “garbage” is impossible, we still need to do the work to understand the potential sources of unintended bias and address them where we can.

Getting back to my conversation with Zane: nowhere is the responsibility to understand and mitigate AI bias more important than with regard to LLMs and generative AI. ChatGPT and models like it will eventually inform how humankind gathers, interprets, and disseminates information. If we do not understand the underlying biases of these models, we will surrender unintended power to a select few companies to decide how the world thinks.

To that end, last month, scholars from the University of Washington, Carnegie Mellon, and Xi’an Jiaotong University in China published research demonstrating that different LLMs — built by the lines of OpenAI, Google, and other companies—have inherent political biases. Moreover, data scientists can feed these models additional data with partisan lean, which will further skew these models in a partisan direction.

AI language models on a political compass.

When these algorithms are applied to the eradication of hate speech and disinformation, the study finds, the content they label as harmful conforms to their political worldview. For example, left-leaning algorithms are better at detecting hate speech targeted at Black and LGBTQ+ communities, whereas right-leaning algorithms are better at detecting hate speech directed a dominant groups, including white people and men. The study concludes that, unchecked, the political biases of these LLMs could further polarize the United States electorate.

When confronted with its potential left-leaning bias, OpenAI wrote that its bias is a “bugs, not features” and that human reviewers can manipulate the algorithm and its training data to remove bias. However, no amount of humans combing through the infinite internet data on which ChatGPT is trained can completely eliminate the model’s bias. The internet, and all its content — the data on which LLMs are trained — was created by human beings.

Nevertheless, there are things we can do when confronted with the bias of the LLMs that will be, increasingly, at the heart of global society. Although there are no silver bullet answers, we know a few things so far:

We can stay on top of studies like this one to understand the underlying biases of LLMs.
We can deploy mitigation strategies specific to the context in which we are working. For example, the study recommends deploying specific LLMs when trying to root out specific types of hate speech, or combining algorithms to get multiple perspectives on whether a given piece of speech is hateful.
We can disclose the underlying bias of results when deploying LLMs within our company contexts. Make sure the “customers” of an LLM’s output know how to contextualize the data they consume.
We can remain on the lookout for best practices to mitigate the biases of LLMs and share best practices if we develop ones that work for us.
Finally, we can advocate to hold companies accountable to transparency and safety standards. The “voluntary commitments” secured by the Biden administration from the largest AI companies are a step in the right direction but are insufficient.

In the not-too-distant future, it is easy to imagine millions of people asking ChatGPT and LLMs like it for all sorts of information. The vast majority of those people will not interrogate the information they receive from these algorithms. Even though eliminating all bias is impossible, it is up to the people building and deploying those models today, in the early days of generative AI, to mitigate unintended bias as best they can.

The Ethical Technologist

Discussion about this post