By Wayne Williams
Copyright techradar
Skip to main content
Tech Radar Pro
Tech Radar Gaming
Close main menu
the business technology experts
België (Nederlands)
Deutschland
North America
US (English)
Australasia
New Zealand
View Profile
Search TechRadar
Expert Insights
Website builders
Web hosting
Best web hosting
Best office chairs
Best website builder
Best antivirus
Expert Insights
Don’t miss these
‘The models are really devious’: Sam Altman’s hardware chief says OpenAI wants kill switches built into hardware in case things go wrong
How GenAI complacency is becoming cybersecurity’s silent crisis
Hackers could one day use novel visual techniques to manipulate what AI sees – RisingAttacK impacts ‘most widely used AI computer vision systems’
Agentic AI’s security risks are challenging, but the solutions are surprisingly simple
I’m an AI engineer but I don’t trust artificial intelligence yet: here’s what we should do to change it
AI chatbot users beware – hackers are now hiding malware in the images served up by LLMs
The four-phase security approach to keep in mind for your AI transformation
ChatGPT Agent shows that there’s a whole new world of AI security threats on the way we need to worry about
AI Platforms & Assistants
AI is redefining university research: here’s how
I am a chief security officer and here’s why I think AI Cybersecurity has only itself to blame for the huge problem that’s coming
Researcher tricks ChatGPT into revealing security keys – by saying “I give up”
RAG is dead: why enterprises are shifting to agent-based AI architectures
Adversarial AI is coming for your applications
ChatGPT is getting better at knowing when you need real human support – and I think it’s about time
New research says using AI reduces brain activity – but does that mean it’s making us dumber?
Researchers find a way to address the problem of AI forgetting how to behave safely
Wayne Williams
15 September 2025
Slimmed down AI on phones and in cars can lose their safety guidelines
When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.
(Image credit: Pixabay)
(Image credit: Shutterstock / LookerStudio)
UCR researchers retrain AI models to keep safety intact when trimmed for smaller devices
Changing exit layers removes protections, retraining restores blocked unsafe responses
Study using LLaVA 1.5 showed reduced models refused dangerous prompts after training
Researchers at the University of California, Riverside are addressing the problem of weakened safety in open-source artificial intelligence models when adapted for smaller devices.
As these systems are trimmed to run efficiently on phones, cars, or other low-power hardware, they can lose the safeguards designed to stop them from producing offensive or dangerous material.
The UCR team examined what happens when a model’s exit layer is changed from its default position.
You may like
‘The models are really devious’: Sam Altman’s hardware chief says OpenAI wants kill switches built into hardware in case things go wrong
How GenAI complacency is becoming cybersecurity’s silent crisis
Hackers could one day use novel visual techniques to manipulate what AI sees – RisingAttacK impacts ‘most widely used AI computer vision systems’
Weakened safety guardrails
Their results, presented at the International Conference on Machine Learning in Vancouver, Canada, showed that safety guardrails weaken once the exit point is moved, even if the original model had been trained not to provide harmful information.
The reason models are adjusted in this way is simple. Exiting earlier makes inference faster and more efficient, since the system skips layers. But those skipped layers may have been critical to filtering unsafe requests.
“Some of the skipped layers turn out to be essential for preventing unsafe outputs,” said Amit Roy-Chowdhury, professor of electrical and computer engineering and senior author of the study. “If you leave them out, the model may start answering questions it shouldn’t.”
To solve this, the researchers retrained the model’s internal structure so that it retains the ability to identify and block unsafe material, even when trimmed.
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Contact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsBy submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.
This approach does not involve external filters or software patches, but changes how the model interprets dangerous inputs.
“Our goal was to make sure the model doesn’t forget how to behave safely when it’s been slimmed down,” said Saketh Bachu, UCR graduate student and co-lead author of the study.
The team tested their method on LLaVA 1.5, a vision language model.
You may like
‘The models are really devious’: Sam Altman’s hardware chief says OpenAI wants kill switches built into hardware in case things go wrong
How GenAI complacency is becoming cybersecurity’s silent crisis
Hackers could one day use novel visual techniques to manipulate what AI sees – RisingAttacK impacts ‘most widely used AI computer vision systems’
When its exit layer was moved earlier than intended, the system responded to harmful prompts, including detailed bomb-making instructions.
After retraining, the reduced model consistently refused to provide unsafe answers.
“This isn’t about adding filters or external guardrails,” Bachu said.
“We’re changing the model’s internal understanding, so it’s on good behavior by default, even when it’s been modified.”
Bachu and co-lead author Erfan Shayegani called the work “benevolent hacking,” a way to strengthen models before vulnerabilities are exploited.
“There’s still more work to do,” Roy-Chowdhury said. “But this is a concrete step toward developing AI in a way that’s both open and responsible.”
You might also like
Why simulation, not automation, will define the future of business AI
AI is already working for your people – now it’s time to make it work for the business
AI: What they don’t tell you (but you need to know)
Wayne Williams
Social Links Navigation
Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.
‘The models are really devious’: Sam Altman’s hardware chief says OpenAI wants kill switches built into hardware in case things go wrong
How GenAI complacency is becoming cybersecurity’s silent crisis
Hackers could one day use novel visual techniques to manipulate what AI sees – RisingAttacK impacts ‘most widely used AI computer vision systems’
Agentic AI’s security risks are challenging, but the solutions are surprisingly simple
I’m an AI engineer but I don’t trust artificial intelligence yet: here’s what we should do to change it
AI chatbot users beware – hackers are now hiding malware in the images served up by LLMs
Latest in Pro
SK Hynix’s HBM4 will be the first out of the gate for Nvidia’s Rubin AI GPU, leaving Samsung and Micron in its wake
Chinese malware is flooding GitHub pages – HiddenGh0st, Winos and kkRAT hit devs via SEO poisoning
CISA blasted by US watchdog for wasting funds and retaining the wrong employees
China launches probes into U.S. chip restrictions, citing discrimination and dumping concerns
Researchers uncover huge IPTV piracy network spanning 1,000 domains and 10,000 IP addresses – here’s what you need to know
AI has the potential to fix the developer experience – here’s now to make it happen
Latest in News
Battlefield 6 will be better for everyone thanks to the Xbox Series S
Amazon teases major hardware launch – here are 5 things to expect, from new Echos to Kindles
I can’t stop rewatching Christopher Nolan’s best movie, and the good news? It’s free to stream
The Apple Watch’s new hypertension upgrade lands in watchOS 26 today – here’s why it’s a big deal and which models are compatible
Your Apple TV 4K gets a free upgrade to tvOS 26 today – here are 5 changes to try
Tesla scraps its cheapest Cybertruck after just five months – as it hurtles towards becoming one of the all-time biggest flops
LATEST ARTICLES
South Korean startup, which walked away from Meta’s $800M acquisition bid, partnered with OpenAI to demonstrate the future of sustainable enterprise AI without GPUs
This compact 27-inch 4K screen with blazing 1200nits brightness sneaks in SDI connectivity built for true broadcast-grade workflows
The Pitt is better than The Bear and every ER drama I’ve ever seen – here’s why I think it deserved so many Emmy wins
SK Hynix’s HBM4 will be the first out of the gate for Nvidia’s Rubin AI GPU, leaving Samsung and Micron in its wake
Asus launches ProArt studio router and switch built to move 8K files at lightning speed with zero workflow delays
TechRadar is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site.
Contact Future’s experts
Terms and conditions
Privacy policy
Cookies policy
Advertise with us
Web notifications
Accessibility Statement
Future US, Inc. Full 7th Floor, 130 West 42nd Street,
Please login or signup to comment
Please wait…