How Censorship Can Influence Artificial Intelligence

Artificial intelligence is hardly confined by international borders, as businesses, universities, and governments tap a global pool of ideas, algorithms, and talent. Yet the AI programs that result from this global gold rush can still reflect deep cultural divides.

New research shows how government censorship affects AI algorithms—and can influence the applications built with those algorithms.

Margaret Roberts, a political science professor at UC San Diego, and Eddie Yang, a PhD student there, examined AI language algorithms trained on two sources: the Chinese-language version of Wikipedia, which is blocked within China; and Baidu Baike, a similar site operated by China’s dominant search engine, Baidu, that is subject to government censorship. Baidu did not respond to a request for comment.

The researchers were curious whether censorship of certain words and phrases could be learned by AI algorithms and find its way into software that uses those algorithms. This might influence the language that a chatbot or a voice assistant uses, the phrasing by a translation program, or the text of autocomplete tools.

The type of language algorithm they used learns by analyzing the way words appear together in large quantities of text. It represents different words as connected nodes in a physical space; the closer words appear, the more similar their meaning.

A translation program might infer the meaning of an unknown word by looking at these relationships in two different languages, for example.

The UCSD researchers found key differences in the resulting AI algorithms that the researchers said seem to reflect the information that is censored in China. For example, the one trained on Chinese Wikipedia represented “democracy” closer to positive words, such as “stability.” The algorithm trained on Baike Baidu represented “democracy” closer to “chaos.”

Roberts and Yang then used the algorithms to build two programs to assess the sentiment—the positive versus negative meaning—of news headlines. They found that one trained on Chinese Wikipedia assigned more positive scores to headlines that mentioned terms including “election,” “freedom,” and “democracy,” while the one trained on Baidu Baike assigned more positive scores to headlines featuring “surveillance,” “social control,” and “CCP.” The study will be presented at the 2021 Conference on Fairness Accountability and Transparency (FAccT) in March.

In recent years, researchers have highlighted how race and gender biases can lurk in many artificial intelligence systems. Algorithms trained on text scraped from the web or old books, for instance, will learn to replicate the biases displayed by the human authors of that text. In 2018, researchers at Google demonstrated cultural biases in image recognition algorithms, which may, for example, recognize only Western wedding scenes.

The WIRED Guide to Artificial Intelligence

Supersmart algorithms won’t take all the jobs, But they are learning faster than ever, doing everything from medical diagnostics to serving up ads.

Roberts notes that the differences seen in their study may not be due entirely to government censorship. Some may be the result of self-censorship or simply cultural differences between those writing the encyclopedia articles. But she says it is important to recognize that government policy can cause other forms of bias to lurk in AI systems. “We see this as a starting point for trying to understand how government-shaped training data appears within machine learning,” Roberts says.

Roberts says researchers and policymakers need to consider how governments in the future might influence how AI systems are trained in order to make censorship more effective or export particular values.

Graeme Hirst, a professor at the University of Toronto who specializes in computational linguistics and natural language processing, has a few qualms with the study methodology. Without carefully studying the differences between Chinese Wikipedia and Baidu Baike, Hirst says, it is hard to ascribe variations in the algorithms to censorship. It is also possible that Chinese Wikipedia contains anti-Chinese or overtly pro-democracy content, he says. Hirst adds that it is unclear how the sentiment analysis was done and whether bias may have been introduced there.

Others see it as a welcome contribution to the field.

“In a certain sense, this is not surprising,” says Suresh Venkatasubramanian, a professor at the University of Utah who studies AI ethics and cofounded the FAcct conference.

Venkatasubramanian points out that AI algorithms trained on Western news articles might contain their own anti-China biases. “But I think it’s still important to do the work to show it happening,” he says. “Then you can start asking how it shows up, how do you measure it, what does it look like and so on.”


More Great WIRED Stories

Speak Your Mind

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Get in Touch

350FansLike
100FollowersFollow
281FollowersFollow
150FollowersFollow

Recommend for You

Oh hi there 👋
It’s nice to meet you.

Subscribe and receive our weekly newsletter packed with awesome articles that really matters to you!

We don’t spam! Read our privacy policy for more info.

You might also like

Beware This New Microsoft Teams Password Hacking Threat To...

New hacking threat to Microsoft Teams users revealed ...

Alessandro Michele’s Gucci Announcement: A Leap Towards Sustainability Or...

Alessandro Michele, Gucci Fall/Winter 2020/21 fashion show, Milan Fashion...

With Maple Leafs, Free Agent Winger Alexander Barabanov Had...

ST PETERSBURG, RUSSIA OCTOBER 30, 2018: SKA St Petersburg's...

Srei Lines 10 Mn Euro Export Finance From KfW...

Kolkata: Srei Equipment Finance Limited (SEFL) on Friday said it has successfully...