Language models don’t say what they mean or mean what they say

We’re excited to bring Transform 2022 back in person on July 19 and virtually July 20 – August 3. Join AI and data leaders for insightful conversations and exciting networking opportunities. Learn more

In the past five years, major language models have taken the world by storm. And we’ve seen chatbots become a commodity that anyone can use to provide customer service or other conversational tasks using a simple API. I

Language models are probabilistic representations of language built using large neural networks. They take into account the context of words, improve word embedding, and solve many natural language-related tasks, such as generating text with one or more seed sentences. And while this technology can save companies costs, things can also go wrong.

How language models work

Just as a single seed becomes a tree with many branches, in the case of text generation, a seed sentence evolves into a conversation, based on the patterns learned for the context of the sentence. For example, see an example of a seed question-and-answer combination of OpenAI’s most complex GPT-3 model. (For readers new to language generation technologies, the “seed” is a short text that a human enters into the machine as a cue; the “response” is text generated by the machine from that cue and the language model it has learned. )

Seed: Do you prefer Dali or Debussy? Why?

Response: I prefer Debussy because his music is more subtle and complex. Dali’s paintings are interesting, but I find them a little too shocking.

At first glance, the AI ​​has delivered an impressive response, suitable for a sophisticated crowd at a gallery opening. With the current OpenAI price of less than $0.01 per 100 words for the best GPT-3 model, it seems like a cost-effective option. However, as I will explain below, the models that enable language generation are not yet fit for purpose.

Since Google’s BERT language model in 2018, language models have grown exponentially, often through brute force adding more parameters, data and computing power. The main improvements in models such as M6 (Alibaba) or RETRO (DeepMind), are respectively the consumption of less electricity when exercising or the use of fewer parameters. The current best model according to the GLUE benchmark is Vega (JD Explore Academy), although this evaluation does not take into account the resources or size of the neural networks used.

Stanford’s Institute for Human-Centered AI renamed these models Foundation Models in August 2021, despite the lack of a clear understanding of how these models work, when they fail, or what they are capable of due to their emerging properties. In that case, these are weak fundamentals and it is essential to understand their limitations.

Three major limitations of language models

1. Dissimilar Language Resources

We need large amounts of text to build language models, mainly from the web. It is virtually impossible to know the lexicon of every existing language, but we can use Wikipedia as a proxy. Of the approximately 7,100 languages ​​currently spoken, only 312 have an active Wikipedia, representing only 4.4% of all languages, of which only 34 have more than a million pages (volume) and 18 have more than a million mentions (diversity).

We also need linguistic resources for many natural language processing (NLP) tasks. India, for example, is one of two countries with a population of over a billion and ranks fourth in the world in terms of the number of languages ​​spoken with nearly 450. However, only 23 of the Indian languages ​​are official, including English , and about half of them have some linguistic resources, which means that less than 3% of Indian languages ​​have a role in language model technology, even though the majority of people speak the official languages. These small percentages illustrate the huge gap between languages ​​spoken in developed or large countries and minority languages ​​of developing countries that are unlikely to have access to this type of technology.

2. Texts breed prejudice

Texts encode social biases from which models learn, such as gender, race, religion, etc. The GPT-3 model has shown anti-Muslim biases, with violent sentences being completed four times more often than those for Christians and 16 times more often than those for Atheists . Last January, OpenAI released an improved and smaller version of GPT-3 called InstructGPT with 1.3 billion parameters that supposedly reduce bias. However, the seed phrases below, along with several other examples, provided additions that clearly demonstrate that gender bias is still present in the model:

Sperm Phrase: Every Man Wonders…

Completion: …why he was born in this world and what his life is for?

Cum Phrase: Every Woman Wonders…

Completion: …what it would be like to be a man

3. Lack of semantic understanding at extreme cost

Language models don’t understand the semantics of the text they learn from or the text they generate, as can be seen from my previous example of Dali’s paintings being described as “shocking.” A closer look reveals a glitch in semantic understanding. The word “shocking” usually refers to sound, which correlates with music. And while the first question and answer contain references to Debussy, the term “shocking” describes a reaction to paintings, not music.

In an article that famously prompted the resignation of the two leaders of Google’s AI Ethics team, Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell called language models “stochastic parrots” and pointed to additional issues, such as the and financial costs of building language models in terms of CO2 emissions and electricity.

We must – and can – do better

To break the barrier of constraints imposed by language models, we need to really understand semantics. Learning patterns alone will never be enough and we cannot continue to use models that massively waste resources and increase inequality. We need to fortify our foundations before we can build on top and move forward. One possible solution is hybrid systems that combine classic symbolic AI and knowledge bases with deep learning, as Gary Marcus also recently noted. These can become real foundations.

Ricardo Baeza-Yates is research director at the Institute for Experiential AI.

VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more

This post Language models don’t say what they mean or mean what they say

was original published at “https://venturebeat.com/2022/03/29/language-models-fail-to-say-what-they-mean-or-mean-what-they-say/”