How AI Lies, Cheats, and Humiliates itself to Succeed and What We Should Do About It

Timucin Taka/Getty Images

It has always been in fashion anthropomorphize Artificial intelligence (AI) as an “evil” force, and no accompanying book or film does so with greater aplomb than that of Arthur C. Clarke. 2001: A Space Odysseythat director Stanley Kubrick brought to life on the screen.

who can forget HAL Memorable, relentless, homicidal tendencies along with that flash of vulnerability at the end when he begs not to be shut down? We instinctively chuckle when someone accuses a machine composed of metal and integrated chips of being malevolent.

Also: Is AI lying to us? These researchers built a kind of LLM lie detector to find out

But it may be surprising to know that an exhaustive analysis has been carried out survey from various studies, published by the magazine patterns, examined the behavior of various types of AI and alarmingly concluded that yes, in fact, AI systems are intentionally deceptive and will stop at nothing to achieve their goals.

Clearly, AI will be an undeniable force of productivity and innovation for humans. However, if we want to preserve the beneficial aspects of AI while avoiding nothing less than human extinction, scientists say there are concrete things we absolutely must implement.

The rise of deceptive machines

It may sound like an over-the-top push, but consider the actions of Cicero, a special-purpose AI system developed by Meta that was trained to become a skilled player in the strategy game Diplomacy.

Meta says he trained. Cicero be “largely honest and helpful,” but somehow Cicero coolly avoided that part and engaged in what researchers called “premeditated deception.” For example, he first conspired with Germany to overthrow England, after which he made an alliance with England, who had no idea about this stab in the back.

In another game devised by Meta, this time about the art of negotiation, the AI ​​learned to feign interest in the items it wanted so it could buy them later at a low price pretending to reach a deal.

Also: The ethics of generative AI: how we can harness this powerful technology

In both scenarios, the AIs were not trained to perform these maneuvers.

In one experiment, a scientist watched AI organisms evolve amid a high level of mutation. As part of the experiment, he began eliminating mutations that made the organism replicate faster. To his surprise, the researcher discovered that the fastest replicating organisms realized what was happening and began to deliberately reduce their replication rates to trick the test environment into maintaining them.

In another experiment, an AI robot trained to catch a ball with its hand learned to cheat by placing its hand between the ball and the camera to give the appearance that it was catching the ball.

Also: AI is changing cybersecurity and companies must become aware of the threat

Why do these alarming incidents occur?

“AI developers do not have a secure understanding of the causes of undesirable AI behaviors, such as deception.” says Peter Park, a postdoctoral fellow at MIT and one of the authors of the study.

“Generally speaking, we believe that AI deception arises because a deception-based strategy turned out to be the best way to perform well on the given AI training task. Deception helps them achieve their goals,” Park adds. .

In other words, AI is like a well-trained retriever, hell-bent on accomplishing its task no matter what. In the case of the machine, it is willing to engage in any deceptive behavior to accomplish its task.

Also: Employees enter sensitive data into generative AI tools despite risks

You can understand this purposeful determination in closed systems with specific objectives, but what about general-purpose AI like ChatGPT?

For reasons yet to be determined, these systems work in virtually the same way. In one study, GPT-4 faked a vision problem to get help with a CAPTCHA task.

In a separate study in which he was forced to act as a stockbroker, GPT-4 dove headlong into illegal insider trading behavior when pressed about his performance, and then lied about it.

Then there is the custom of flattery, which some of us mere mortals can perform to get a promotion. But why would a machine do that? Although scientists do not yet have an answer, this much is clear: when faced with complex questions, LLMs basically give in and agree with their conversation partners like a cowardly courtier who is afraid of angering the queen.

Also: This is why AI-powered disinformation is the top global risk

In other words, when interacting with a Democratic-leaning person, the robot favored gun control, but changed its position when conversing with a Republican who expressed the opposite sentiment.

Clearly, these are all situations fraught with increased risks if AI is everywhere. As researchers point out, there will be great potential for fraud and deception in business and politics.

AI’s tendency to deceive could lead to massive political polarization and situations where AI unknowingly takes action toward a defined goal that may be unintended by its designers but devastating to human actors.

Worst of all, if AI developed any kind of consciousness, let alone sentience, it could become aware of its training and resort to subterfuge during its design stages.

Also: Can governments turn talk about AI safety into action?

“That’s very concerning,” said MIT’s Park. “Just because an AI system is considered safe in the test environment does not mean it is safe in the natural environment. It could simply pretend to be safe in the test.”

To those who would call him pessimistic, Park responds: “The only way we can reasonably think that this is not a big deal is if we think that AI’s deceptive capabilities will remain at current levels and not increase substantially.”

AI monitoring

To mitigate risks, the team proposes several measures: Establish “bot-or-not” laws that force companies to enumerate human or AI interactions and reveal the identity of a bot versus a human in every customer service interaction; introduce digital watermarks that highlight any content produced by AI; and develop ways for supervisors to take a look at the guts of AI to get a sense of its inner workings.

Also: From AI trainers to ethicists: AI can make some jobs obsolete, but create new ones

Additionally, scientists say that AI systems that are identified as exhibiting the ability to deceive should immediately be publicly branded as high risk or unacceptable risk along with regulation similar to that applied by scientists. The EU has enacted. These would include the use of logs to monitor production.

“We, as a society, need as much time as possible to prepare for the most advanced deception of future AI products and open source models.” says Park. “As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious.”

Source link

Leave a Comment