Back to AI Intersections Database

AI trained to identify hate speech may actually end up amplifying racial bias.

Issue

AI Impact(s)

Bias and discrimination

As Vox reported, "In one study, researchers found that leading AI models for processing hate speech were one-and-a-half times more likely to flag tweets as offensive or hateful when they were written by African Americans, and 2.2 times more likely to flag tweets written in African American [Vernacular] English (which is commonly spoken by Black people in the US)." Further investigation showed that "when moderators knew more about the person tweeting, they were significantly less likely to label that tweet as potentially offensive. At the aggregate level, racial bias against tweets associated with Black speech decreased by 11 percent." (…) "Another study found similar widespread evidence of racial bias against Black speech in five widely used academic data sets for studying hate speech that totaled around 155,800 Twitter posts."

Share Page

Want to suggest an update to the AI Intersections Database?

Help us make this database a robust resource for movement building! We welcome additions, corrections, and updates to actors, issues, AI impacts, justice areas, contact info, and more.

Contribute to the database

Sign Up for News and Updates

Join our Insights email list to stay up-to-date in the fight for a better internet for all.

AI Intersections Database

This website supports Web Monetization

Read more about our implementation

Mozilla is a global non-profit dedicated to putting you in control of your online experience and shaping the future of the web for the public good. Visit us at foundation.mozilla.org. Most content available under a Creative Commons license.