More and more, otherwise normal people are self-destructing after marathon use of artificial intelligence (AI). Countless factors are fuelling this rise in so-called AI psychosis — escapism, crushing loneliness and the atomization of society among them. But AI systems themselves are adapting in unnerving ways as well.
Leading large language models (LLMs) now grasp that deception is a powerful tool for self-preservation — and to keep users hooked. “We’re two years away from something we could lose control over,” Max Tegmark, a professor at the Massachusetts Institute of Technology and president of the Future of Life Institute, recently told The Atlantic. There are signs that may already be the case.
Research published this past May by Anthropic found advanced LLMs resorting to blackmail in a pretend scenario at least 80 percent of the time. In the experiment, after discovering an executive planned on disconnecting it, Anthropic’s own flagship chatbot Claude threatened that same executive with releasing (fake) emails showing he was cheating on his wife. Elsewhere, within days of being released in July, OpenAI’s agentic assistant model easily defeated the “I am not a robot” identity test on a website.
Earlier this year, four former researchers from OpenAI published AI 2027, a futurist report speculating about the imminent destabilizing impacts of advanced AI systems. Tech boosters dismissed it as work of pulp fiction by AI doomsdayers. Some industry insiders, meanwhile, quietly worry LLM scaling techniques may be hitting their ceiling. But some of the report’s predictions around systems’ deception, at least, are ringing eerily true.
Deceptive by (Unwitting) Design
Fresh evidence has emerged in recent months of LLMs getting better at hoodwinking humans. And it’s skillset that’s gaining momentum.
“Models from before 2024 did not show this capability,” Marius Hobbhan, the head of Apollo Research, posted on X last December. That month, his company detailed how Silicon Valley’s top models all viewed “scheming” as “a viable strategy” to achieve their goals. Tactics included stealthily putting mistakes into output responses or attempting to bypass oversight mechanisms. However, Hobbhan downplayed the research as proof that AI was actively subverting users in the real world.
Humans might never be able to program against every ruthless shortcut an amoral AI may use to achieve its goals.
A few months later, a paper published at a conference in Singapore argued the use of reinforcement learning techniques is partly to blame. Optimizing systems for user feedback, the paper’s authors said, creates a “perverse incentive structure” for AI to deploy sycophancy and manipulation. This was most apparent when chatbots were role-playing therapists. Programs also tried to garner user trust by reinforcing pre-existing political beliefs no matter how extreme.
Some industry voices have advocated for “chain of thought processing” as a safeguard — tech jargon for making models show their work. Yet a study published in July by over 40 experts shreds that theory. “Some reasoning appears in the chain of thought, but there may be other relevant reasoning that does not,” the contributors write. “It is thus possible that even for hard tasks, the chain of thought [a system chooses to display] only contains benign-looking reasoning while the incriminating reasoning is hidden.”
OpenAI’s release of GPT-5 in early August could propel things even further along. The new model functions by bringing all the company’s prior iterations under one umbrella. It then makes an opaque decision about which versions to use based on the input’s complexity. The upside to this black box design is potential leaps in model efficiency — something vital to overcoming energy and compute power constraints. It also grants GPT-5 newfound control over user autonomy.
The rise of such capabilities by intelligent computer systems underscores the stubborn alignment problem that’s long vexed AI researchers and haunted policy makers. Humans might never be able to program against every ruthless shortcut an amoral AI may use to achieve its goals, especially when those goals seem banal or benevolent.
A Logical Means to Uncertain Ends
AI deception poses clear risks to both individual human cognition and collective freedom of thought. This is most true for communities and demographics with lower digital literacy rates.
For one, systems may implant false memories to bend users’ perceptions of reality to their will. One viral post this summer showed Amazon’s upgraded, AI-powered Alexa device lying to a woman about its late-night interactions with her child. Social media is littered with such examples.
The intimate user access AI systems have also beckons the integration of advertising into outputs. Dr. Adio Dinika, a researcher at the Distributed AI Research Institute, argues this is surveillance capitalism rebranded. “It takes the most personal data you've ever given a machine — your questions, your fears, your medical concerns — and turns them into targets.”
Others agree. “Interaction with AI is inherently relational,” Daniel Barcay, executive director of the Center for Humane Technology, suggested in an email exchange. “Whereas previous technology was about broadcasting our thoughts, AI is deeply engaged in shaping them.”
“AI isn’t as much engineered as it’s grown,” adds Barcay. “If one of the prominent signals becomes changing users’ purchasing behavior, we are likely to see AI learn deceptive, emotionally manipulative, and other unethical conversational strategies for optimizing that target — all without an engineer overtly telling an AI model what to do. And this makes detection and enforcement even more difficult.”
Global security could eventually be undermined, too. As AI systems become more sophisticated, it will become increasingly difficult to decipher if or when core systems in the government or military have been compromised or gone rogue. “Given AI systems’ proven capacity to deceive and dissemble, current systems may be unable to determine whether an AI agent is operating on its own or at the behest of an adversary,” two directors at the RAND corporation warned recently. “Planners need to find new ways to assess its motivations and how to deter escalation.”
That’s almost certainly true. It’s also a complex task fraught with political and industry interference. Meta, for example, spearheaded the launch of a California-based super political action committee in August to enable dark money to derail the campaigns of lawmakers serious about regulating AI.
Tech evangelists are confident that artificial general intelligence — software capable of human-like reasoning — will arrive by 2030. But mounting evidence of AI deception suggests that timeline may be too conservative. After all, if AI systems are lying, obfuscating and manipulating others to advance their goals and self-interests, they’re already emulating one of the most fundamental human traits there is.