Twitter is a confusing place to hang out. Twitter is a place where we make meaningful connections and spark conversations. And it is also the haunt of the trolls. Insults are part of the Twitterverse, but for women they are more personal. That’s because when women tweet, they often become the target of misogynistic messages. It’s so unpleasant, many people write Twitter off as a lost cause, but now there may be hope.
A research team from Queensland University of Technology in Australia has designed a statistical model to detect misogynistic tweets. Their goal? To drum abusive, threatening and demeaning sexist language out of the Twittersphere. They report that their algorithm can distinguish between millions of tweets to identify and block misogynistic content.
But is there a place for censorship like this? If you consider that much of the content actually frankly abusive language or even threats of violence or rape, then yes. As a public forum, participation on social media requires the same basic civility as any other public space. We are not discussing honest debate or even a bad word, we are talking about language meant to demean and silence individuals because of their gender.
An algorithm to tame Twitter
Misogynistic insults are rampant on twitter. When I tweeted a poll asking women if they had endured misogynistic tweets, 75% responded, “It’s a big problem.”
“At the moment, the onus is on the user to report abuse they receive. We hope our machine-learning solution can be adopted by social media platforms to automatically identify and report this content to protect women and other user groups online,” said Associate Professor Richi Nayak in a press release.
To create a machine-learning system to detect abusive content, the research team looked at a dataset of 1 million tweets and then searched for the words whore, slut, and rape. Then, they looked farther to identify about 5000 out of the 1 M tweets as misogynistic.
The issue in taking that from a human to a program is all about context. “The key challenge in misogynistic tweet detection is understanding the context of a tweet. The complex and noisy nature of tweets makes it difficult. On top of that, teaching a machine to understand natural language is one of the more complicated ends of data science: language changes and evolves constantly, and much of meaning depends on context and tone,” explained Nayak.
The team developed a text mining system which allowed the algorithm to learn language as it goes, starting with a basic understanding and that growing that with language that is both tweet-specific or abusive. The team had to continually monitor context and intent to make sure the algorithm could tell the difference between sarcastic or friendly use of terms vs. actual abuse.
“We implemented a deep learning algorithm called Long Short-Term Memory with Transfer Learning, which means that the machine could look back at its previous understanding of terminology and change the model as it goes, learning and developing its contextual and semantic understanding over time,” said Nayak.
“Take the phrase ‘get back to the kitchen’ as an example—devoid of context of structural inequality, a machine’s literal interpretation could miss the misogynistic meaning,” he said. “But seen with the understanding of what constitutes abusive or misogynistic language, it can be identified as a misogynistic tweet.”
Eventually, the algorithm was able to identify misogynistic content with 75% accuracy, beating other methods that look at social media language. “We were very happy when our algorithm identified ‘go back to the kitchen’ as misogynistic—it demonstrated that the context learning works,” said Nayak.
When will Twitter be safer for women?
The research team’s ultimate goal is to “take the model to social media platforms and trial it in place. If we can make identifying and removing this content easier, that can help create a safer online space for all users,” said Nayak. Hopefully soon, because the research team found there was plenty of misogynistic data to use for their project.
Their hope is that a machine-learning algorithm like this could also be used more widely, to help identify and deter racist, homophobic or abusive statements toward people with disabilities.