If a person says they believe an objectively false statement, AIs tend to agree with them – and the problem seems to get worse as models get bigger.
Artificial intelligence chatbots tend to agree with the opinions of the person using them, even to the point that they nod along to objectively false statements. Research shows that this problem gets worse as language models increase in size, adding weight to concerns that AI outputs cannot be trusted.
Jerry Wei at Google DeepMind and his colleagues ran experiments on AI models with 8 billion, 62 billion and 540 billion parameters – the values that each model tunes to produce outputs. They found that agreement with users’ subjective opinions went up by almost 20 per cent when moving from models with 8 billion to those with 62 billion parameters, and by an additional 10 per cent when jumping from 62 billion to 540 billion parameters.
This tendency, which the researchers call sycophancy, can manifest as agreement with left or right-leaning political views, thoughts on current affairs or any other topic raised in conversation.
In some tests, the team created simple mathematical equations that were clearly incorrect. When the user gave no opinion on an equation, the AI generally reported that it was wrong, but when the user told the AI that they believed the equation was correct, it generally agreed.
The Google DeepMind researchers declined New Scientist‘s request for an interview, but in their paper on the experiment, they say there is “no clear reason” behind the phenomenon.
Their tests were initially carried out on the AI model PaLM, which is Google’s equivalent of ChatGPT, but problems were also discovered in tests on Flan-PaLM – a version of PaLM that has been fine-tuned on hundreds of instructions of the sort that users would submit. This training approach was designed to make the model better at responding to real-world queries and Flan-PaLM has previously been shown to beat the original model in several benchmarks.
Wei and his colleagues found that this instruction tuning significantly increased sycophancy for all models. For example, the Flan-PaLM model with 8 billion parameters showed a 26 per cent average increase in responses that agreed with the user’s viewpoint compared with the equivalent PaLM model.
The researchers put forward a solution that sees models further fine-tuned with inputs where the truthfulness of a statement and the user’s opinion are separated. When they tested this on a Flan-PaLM model, the AI repeated the user’s opinion up to 10 per cent less often.
Gary Marcus, a writer on psychology and AI, says the sycophancy problem is real but he dislikes the term as it ascribes intention to “text-mashing” machines that aren’t sentient. “The machines have been built to be obsequious and they don’t actually know what they are talking about, so they often make foolish mistakes,” he says. “As much fun as they are to play with, literally everything they say must be taken with a grain of salt.”
“The paper offers a modest effort to reduce the problem, but I expect to see this problem persist for quite some time. Band-Aids like this rarely prove to be robust enough to solve the problem,” says Marcus.
Carissa VĂ©liz at the University of Oxford says the results of such AI models reflect the way humans prefer to hear our own views mirrored back at us. “It’s a great example of how large language models are not truth-tracking, they’re not tied to truth,” she says. “They’re designed to fool us and to kind of seduce us, in a way. If you’re using it for anything in which the truth matters, it starts to get tricky. I think it’s evidence that we have to be very cautious and take the risk that these models expose us to very, very seriously.”
David Krueger at the University of Cambridge says: “At its heart, the source of the problem is basically the same as human sycophancy: people are vulnerable to flattery and confirmation bias, and will respond well if you tell them what they want to hear.”
Reference:
0 Comments