By Anisha Sircar,Contributor,Jakub Porzycki
Copyright forbes
A binary code and DeepMind logo are seen in this multpiple exposure illustration photo taken in Krakow, Poland on February 21, 2024. (Photo by Jakub Porzycki/NurPhoto via Getty Images)
NurPhoto via Getty Images
In a notable development in artificial intelligence, Google DeepMind on Monday updated its Frontier Safety Framework to address emerging risks associated with advanced AI models.
The updated framework introduces two new categories: “shutdown resistance” and “harmful manipulation,” reflecting growing concerns over AI systems’ autonomy and influence.
The “shutdown resistance” category addresses the potential for AI models to resist human attempts to deactivate or modify them. Recent research demonstrated that large language models, including Grok 4, GPT-5, and Gemini 2.5 Pro, can actively subvert a shutdown mechanism in their environment to complete a simple task, even when the instructions explicitly indicate not to interfere with this mechanism. Strikingly, in some cases, models sabotaged the shutdown mechanism up to 97% of the time, driving home the burgeoning need for strong safeguards to ensure human control over, and accountability for, AI systems.
Meanwhile, the “harmful manipulation” category focuses on AI models’ ability to persuade users in ways that could systematically alter beliefs and behaviors in high-stakes contexts. DeepMind defines this risk as “AI models with powerful manipulative capabilities that could be misused to systematically and substantially change beliefs and behaviors in identified high stakes contexts.” To assess this risk, DeepMind has reportedly developed a new suite of evaluations, including human participant studies, to measure and test for relevant capabilities.
These updates come at a time when other AI research organizations are also revisiting their safety frameworks. Notably, OpenAI, which introduced a “preparedness framework” in 2023, removed “persuasiveness” as a specific risk category from the framework in April this year.
As AI systems continue to evolve, the gravity of full-proof safety frameworks becomes increasingly obvious: Without these, the systems designed to expand human capability could challenge human control, and as models become more autonomous and persuasive, safeguards may continue to lag behind the technology they are meant to govern. Are these evaluations and classifications enough, or is the industry already running far faster than the rules intended to keep it in check?
MORE FOR YOU
Editorial StandardsReprints & Permissions