Web-searchable language models hold promise — but also raise concerns
Missed a session at the Data Summit? View on demand here.
Language models — AI systems that can be asked to write essays and emails, answer questions and more — remain flawed in many ways. Because they “learn” to write from examples on the Internet, including problematic social media posts, they are prone to generating misinformation, conspiracy theories, and racist, sexist, or otherwise toxic language.
Another important limitation of many of the current language models is that they are, in a sense, ‘stuck in time’. Once trained on a large collection of text from the Internet, their knowledge of the world – which they get from that collection – can quickly become outdated, depending on when they were deployed. (In AI, “training” refers to teaching a model to correctly interpret data and learn from it to perform a task, in this case generating text.) For example, You.com’s writing aid tool — powered on by OpenAI’s GPT-3 language model, which was trained in the summer of 2020 – answers the question “Who is the president of the US?” with “The current president of the United States is Donald Trump.”
The solution, some researchers suggest, is to give language models access to web search engines like Google, Bing, and DuckDuckGo. The idea is that these models can simply search for the latest information on a particular topic (e.g. the war in Ukraine) instead of relying on old, factually wrong data to come up with their text.
In an article published early this month, researchers at DeepMind, the AI lab supported by Google parent company Alphabet, describe a language model that answers questions by using Google Search to find a top list of relevant, recent web pages. . After summarizing the first 20 web pages in six-sentence paragraphs, the model selects the 50 paragraphs most likely to contain high-quality information; generates four “candidate” answers for each of the 50 paragraphs (for a total of 200 answers); and determines the “best” answer using an algorithm.
While the process may sound complicated, the researchers claim it vastly improves the actual accuracy of the model’s answers — by as much as 30% — for questions and can be answered using information contained in a single paragraph. The accuracy improvements were lower for multi-hop questions, which require models to collect information from different parts of a web page. But the co-authors note that their method can be applied to virtually any AI language model without much modification.
OpenAI’s WebGPT searches the Internet for answers to questions and cites the sources.
“By using a commercial engine as our retrieval system, we have access to up-to-date information about the world. This is especially beneficial as the world has evolved and our outdated language models now have outdated knowledge… Improvements weren’t just limited to the largest models; we saw performance improvements across the board from model sizes,” the researchers wrote, referring to the parameters in the models they tested. In the field of AI, models with a large number of parameters – the parts of the model learned from historical training data – are considered ‘large’, while ‘small’ models have fewer parameters.
The common belief is that larger models outperform smaller models — a view challenged by recent work from labs, including DeepMind. Could it be that instead all language models need access to a wider range of information?
There is some external evidence to support this. For example, researchers at Meta (formerly Facebook) developed a chatbot, BlenderBot 2.0, that improved upon its predecessor by searching the Internet for up-to-date information on things like movies and TV shows. Meanwhile, Google’s LaMDA, which is designed to have conversations with people, “checks” itself by searching the Internet for resources. Even OpenAI has explored the idea of models that can search and navigate the web — the lab’s “WebGPT” system used Bing to find answers to questions.
New risks
But while web search opens up many opportunities for AI language systems, it also brings new risks.
The “live” web is less composite than the static data sets historically used to train language models and, by implication, less filtered. Most labs developing language models make every effort to identify potentially problematic content in the training data to minimize potential future problems. For example, in creating an open source text dataset containing hundreds of gigabytes of web pages, research group EleutherAI claims to have performed “extensive bias analysis” and made “hard editorial decisions” to exclude data that they believe was “unacceptably negatively biased”. towards certain groups or views.
The live web can of course be filtered to some extent. And, as the DeepMind researchers point out, search engines like Google and Bing use their own “security mechanisms” to reduce the chances of untrustworthy content appearing at the top of results. But these results can be played with — and aren’t necessarily representative of the web as a whole. As a recent piece in The New Yorker comments, Google’s algorithm prioritizes websites that use modern web technologies, such as encryption, mobile support, and schema markup. Many websites with otherwise good quality content get lost in the shuffle as a result.
This gives search engines a lot of power over the data that can inform the answers of web-connected language models. Google has been found to prioritize its own services in Search by, for example, answering a travel question with data from Google Places rather than a richer, more social resource like TripAdvisor. At the same time, the algorithmic approach to search opens the door for bad actors. According to The New Yorker, in 2020 Pinterest took advantage of a quirk of Google’s image search algorithm to show more of its content in Google Images.
Labs could instead use their language models off the beaten track search engines like Marginalia, which scours the web for less-visited, mostly text-based, websites. But that wouldn’t solve another major problem with Internet-connected language models: Depending on how the model is trained, it may be incentivized to select data from sources it expects users to find persuasive — even if those sources aren’t objective. the strongest.
The OpenAI researchers encountered this while evaluating WebGPT, which they say led the model to sometimes quote from “highly unreliable” sources. They found that WebGPT contained biases of the model on which the architecture was based (GPT-3), and this influenced the way it chose to search and synthesize information on the Web.
“Search and synthesis both depend on the ability to include and exclude material, subject to a degree of value, and by including the biases of GPT-3 in making these decisions, it can be expected that WebGPT further perpetuates them,” the OpenAI researchers wrote in a study. †[WebGPT’s] answers also seem more authoritative, partly due to the use of quotes. Combined with the well-documented problem of ‘automation bias’, this could lead to over-confidence in WebGPT’s answers.”
Facebook’s BlenderBot 2.0 searches the web for answers.
The automation bias, for context, is people’s tendency to trust data from automated decision-making systems. Too much transparency about a machine learning model and people get overwhelmed. Too little, and people make wrong assumptions about the model, giving them a false sense of confidence.
Solutions to the limitations of language models searching the web remain largely unexplored. But as the desire for more capable, more informed AI systems grows, the problems will become more pressing.
VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more
This post Web-searchable language models hold promise — but also raise concerns
was original published at “https://venturebeat.com/2022/03/19/language-models-that-can-search-the-web-hold-promise-but-also-raise-concerns/”