Racism Has Shaped AI. Scientists, Founders Are Trying to Fix This.

Faulty facial recognition AI led to a wrongful arrest of a Black woman in Detroit last summer.
Experts say AI bias and inaccuracy among people of color are rooted in racist influences.
Some founders and AI researchers are trying to fix this.

Porcha Woodruff's life was falling into place. The mother of two was planning her wedding and had a new baby — a son — on the way.

On February 16, 2023, there was a knock on the door. Several Detroit police officers arrested Woodruff at her home. She was accused of a carjacking.

"The only thing I could think of at that moment was 'I don't want to lose my son,'" Woodruff told Business Insider.

The police had used a facial-recognition AI program that identified her as the suspect based on an old mugshot. The case was later dismissed, but Woodruff's arrest is still in the public record of the 36th District Court of Michigan.

The Detroit Police Department said that it restricts the use of the facial-recognition AI program to violent crimes and that matches it makes are just investigation leads. That's nowhere near enough precaution for Woodruff, who's suing the city and a police detective, seeking damages.

"I shouldn't have had to deal with anything to this extent," she said. "It completely changed my life."

Woodruff's race and gender may have been a factor in her wrongful arrest. "I don't believe that this would have happened to a white woman," Ivan Land, Woodruff's attorney, said.

Passing judgment

Facial-recognition AI, like the kind used by the Detroit Police Department, is designed to identify specific faces from large datasets of photos. It's one of many kinds of AI being used across the world. And it's flawed.

A 2018 study by AI researchers Joy Buolamwini and Timnit Gebru, who evaluated facial-recognition tools offered by Microsoft, IBM, and Face++, found that darker-skinned women were the most likely to be misidentified by facial-recognition technology, with an error rate of 35%.

Six years later, AI is still making troubling mistakes, including misidentifying darker-skinned people and showing prejudice when faced with language associated with nonwhite speakers.

A study published in March that was coauthored by Valentin Hofmann, a researcher at the Allen Institute for AI, found that when African American English was used to prompt models including OpenAI's GPT-4 and Google's T5, the responses "embody covert racism in the form of dialect prejudice."

The study asked the AI models to pass judgment in hypothetical criminal cases where the only evidence was an utterance from the defendant in either AAE or Standard American English. When AAE was used, the models were more likely to rule that the defendants should be convicted.

The study also found that in a hypothetical murder trial, the AI models were more likely to propose the death penalty for an AAE speaker.

A novel proposal

One reason for these failings is that the people and companies building AI aren't representative of the world that AI models are supposed to encapsulate.

In August 1955, a proposal for a new field of study at Dartmouth College posited that intelligence could be artificial. Eight researchers— all white men — set out to research how machines could reason based on connections made through data they were fed, governed by sets of rules.

Almost 70 years on, computer-science education programs continue to overindex for white and male participants.

The National Center for Education Statistics reported that in 1991, white students earned over 70% of computer-science bachelor's degrees, while Black students earned about 8% of the degrees.

In 2022, the latest year on record, white students earned about 46% of computer-science bachelor's degrees, compared with about 9% for Black students. Asian students made the largest gain over that period, going from representing about 8% of the degrees to over 19%.

The Big Tech companies that lead in AI development are also not particularly diverse, despite years of efforts to improve this situation. In 2023, 13.6% of the US population identified as Black or African American. Microsoft's 2023 DEI report, by contrast, noted that 6.7% of the company's workforce was Black.

Data limits

In the 1990s, Geoffrey Hinton pioneered deep-learning neural networks modeled loosely on the human brain. Networks are configured by use case (say, to recognize a face) and must be trained to pattern-match or infer. To get good, these models need lots of data.

"The models don't have enough Black data. It doesn't exist," Christopher Lafayette, the founder of GatherVerse, a tech forum and event company, told Business Insider.

Oji Udezue, the chief product officer of Typeform, a survey-software company, said this limited data availability could amplify inequity.

Internet access in Africa and Southeast Asia rolled out much slower than in North America and Europe. Online data — the basis for most AI models — is skewed mostly white and Western.

"The global south's datasets are not online, cannot be crawled or integrated," Udezue said.

For instance, the performance of early facial-recognition tools was often measured by a benchmark based on more than 13,000 images. Nearly 84% of the people in the pictures were labeled white.

Buolamwini's new book, "Unmasking AI," says that for years this tool was the standard for how companies, including Meta, Google, and Microsoft, evaluated the accuracy of their facial-recognition offerings.

AI has largely been developed by white researchers and trained on Western literature and images. It perhaps shouldn't come as a surprise that these models spit out answers that lack cultural understanding and offer inferior performance in identifying darker-skinned women.

Tech companies have acknowledged some of these challenges. OpenAI has described biases in its tools as "bugs, not features" and said it's "working to share aggregated demographic information" about the people who review its AI models.

Companies have sometimes overcorrected. Google's Gemini, for instance, produced images of people of color in historically inaccurate contexts.

Founders are building alternatives to mainstream offerings. In July 2023, John Pasmore founded Latimer, a large language model that taps GPT-3.5 for its baseline capabilities. Before Latimer responds to users, it pulls in information from its Black data trove in a process known as retrieval-augmented generation to give users more nuanced answers.

Unlearning exclusion

Against the fast-moving tide of AI development, some researchers and executives are trying to weave diverse data and more equitable approaches into the technology.

"We can use AI to overcome human biases," Buolamwini said. "If we don't attend to the differences and are not intentional about being inclusive, we tend to revert the progress already made."

Unlearning the habit of exclusion isn't easy. Machines made in the image of their creators port over the quirks, idiosyncrasies, and biases of these people, no matter how well-intentioned they might be.

"Whether it's conscious or unconscious, I believe that it is more of a feature than a bug," Timothy Bardlavens, the director of product equity at Adobe, told BI.

Bardlavens leads a team that aims to ensure equity is considered and baked into Adobe AI tools. The image generator Firefly, for instance, is designed to avoid stereotypes in depicting groups like women and people of color.

When Firefly was prompted to create images of a doctor treating children in Africa, it produced images of Black male and female doctors.

Adobe Firefly generated image — Adobe Firefly image generator Adobe

The same prompt sent to Microsoft Copilot Designer on March 18 produced images of white men.

Microsoft Bing Copilot generated image — Microsoft Bing Copilot Image Generator Microsoft

Bardlavens said few companies were developing technology through co-creation, which involves getting input from people outside AI companies to incorporate their experiences into product development.

Without this, "you lose the value of having communities give feedback on their experiences," Bardlavens added. "It is impossible to build technology that in some way does not amplify the beliefs of the creator."

An increased focus on who makes AI models and on the data they're trained on presents a growing opportunity for inclusion.

Esther Dyson became Latimer's first investor because she saw the startup's potential to address AI's troubling tendency to ignore Black history and culture.

"AI won't solve all of these problems, but it will do what we as humans ask it to do. Use Latimer, ask it to find the history that was hidden," Dyson told BI. "The value of AI isn't the nodes, it's in the edges and the relationship between the kinds of data."

She argued that companies that prioritize training their models on the best, most representative data will ultimately be the most successful.

Alza's efforts

Arturo Villanueva recalls serving as an intermediary between his parents and the US banking system at the tender age of 9.

"I would call the bank, speak to them in English, then speak to my parents in Spanish, and then I'd get in trouble for not being able to translate words like, 'mortgage' or 'interest-bearing,'" he said. "And I'm like, 'I'm only 9, how am I supposed to do this?'"

Villanueva is the founder and CEO of Alza, a fintech, that uses AI techniques such as computer vision and machine learning to help Latinos and other Spanish speakers better access financial services like savings, checking accounts, and cross-border payments.

These consumers sometimes have shorter credit histories or identification documents that banks consider unusual.

Alza's AI-enabled banking tools are designed to interact with people who speak many colloquial dialects of Spanish and to better accommodate the kinds of application documents that traditional banks can struggle with, such as identification issued by some Latin American countries.

Still, Alza is carefully embracing AI, given the technology's troubled origins.

"We're cautiously stepping into using AI," Andrew Mahon, Alza's head of engineering, said. "To jump too quickly into AI is to risk deploying a biased model that might misinterpret our data."

On February 28, Axel Springer, Business Insider's parent company, joined 31 other media groups and filed a $2.3 billion suit against Google in Dutch court, alleging losses suffered due to the company's advertising practices.

Axel Springer, Business Insider's parent company, has a global deal to allow OpenAI to train its models on its media brands' reporting.

AI has racist tendencies. Scientists and founders are trying to fix it.

Passing judgment

A novel proposal

Data limits

Unlearning exclusion

Alza's efforts

Watch: Women must leverage the "algorithm for equality" as AI goes mainstream, says Shelley Zalis, CEO of The Female Quotient