How we interact with language in the metaverse could set the tone for its future

Missed a session at the Data Summit? View here on demand.
‘Head up. Such conversations can be intense. Don’t forget the person behind the screen.”
Twitter’s dialogue warning is the latest in a protracted battle to help us be more polite to each other online. Perhaps even more disturbing is the fact that we are training large-scale AI language models with data from often toxic online conversations. No wonder we see the bias coming back to us in machine-generated language. As we build the metaverse – basically the next version of the web – what if we use AI to filter toxic dialogue for good?
A facetune for language?
At the moment, researchers are doing a lot with AI language models to fine-tune their accuracy. For example, in multilingual translation models, a human in the loop can make a huge difference. Human editors can check whether cultural nuances are properly reflected in a translation and train the algorithm effectively to avoid similar mistakes in the future. See people as a tuning for our AI systems.
If you imagine the metaverse as some sort of scaled-up SimCity, this type of AI translation could instantly make us all multilingual when we talk to each other. A borderless society could level the playing field for people (and their avatars) who speak fewer common languages and potentially promote greater intercultural understanding. It could even open up new opportunities for international trade.
There are serious ethical questions that come with using AI as a facetune for language. Yes, we can introduce some control over language style, flag instances where models don’t perform as expected, or even change the literal meaning. But how far is too far? How do we continue to nurture diversity of opinion while limiting abusive or abusive speech and behavior?
A framework for algorithmic fairness
One way to make language algorithms less biased is to use synthetic data for training in addition to using the open internet. Synthetic data can be generated from relatively small “real” data sets.
Synthetic datasets can be created to represent the population of the real world (not just those who speak loudest on the internet). It is relatively easy to see where the statistical properties of a certain data set are incorrect and therefore where synthetic data can best be used.
All of this begs the question: Will virtual data be a critical part of making virtual worlds fair and equitable? Can our decisions in the metaverse even affect how we think about and talk to each other in the real world? If the endgame of these technological decisions is a more civilized global discourse that helps us understand each other, synthetic data could be worth its algorithmic weight in gold.
But tempting as it may be to think we can push a button and improve behavior to build a virtual world in a brand new image, this isn’t a matter for technologists to decide alone. It’s unclear whether companies, governments, or individuals will control fairness rules and standards of conduct in the metaverse. With a lot of conflicting interests in the mix, it makes sense to listen to leading tech experts and consumer advocates on how to proceed. Perhaps it is an illusion to assume that there will be a consortium for cooperation between all competing interests, but it is imperative that we create one, in order to have a discussion about unbiased language AI now. Each year of inactivity means dozens — if not hundreds — of metaverses have to be retrofitted to meet potential standards. These issues regarding what it means to have a truly accessible virtual ecosystem need to be discussed now before mass adoption of the metaverse, which will be here before we know it.
Vasco Pedro is co-founder and CEO of the AI-powered language processing platform Unbabel. He spent more than ten years in academic research focused on language technologies and previously worked at Siemens and Google, where he helped develop technologies to better understand data computation and language.
DataDecision makers
Welcome to the VentureBeat Community!
DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.
If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.
You might even consider contributing an article yourself!
Read more from DataDecisionMakers
This post How we interact with language in the metaverse could set the tone for its future
was original published at “https://venturebeat.com/2022/03/13/how-we-handle-language-in-the-metaverse-could-set-the-tone-for-its-future/”