AI trained to identify hate speech may actually end up amplifying racial bias.
As Vox reported, "In one study, researchers found that leading AI models for processing hate speech were one-and-a-half times more likely to flag tweets as offensive or hateful when they were written by African Americans, and 2.2 times more likely to flag tweets written in African American [Vernacular] English (which is commonly spoken by Black people in the US)." Further investigation showed that "when moderators knew more about the person tweeting, they were significantly less likely to label that tweet as potentially offensive. At the aggregate level, racial bias against tweets associated with Black speech decreased by 11 percent." (…) "Another study found similar widespread evidence of racial bias against Black speech in five widely used academic data sets for studying hate speech that totaled around 155,800 Twitter posts."