Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study
By Graham Barlow
Copyright techradar
Skip to main content
Tech Radar Pro
Tech Radar Gaming
Close main menu
the business technology experts
België (Nederlands)
Deutschland
North America
US (English)
Australasia
New Zealand
View Profile
Search TechRadar
Best web hosting
Best office chairs
Best website builder
Best antivirus
Expert Insights
Don’t miss these
Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT
OpenAI says it has proof its tools are making workers more productive
AI Platforms & Assistants
GPT-5 vs. Claude AI – The Battle of Explaining Cold Fusion Simply
GPT-5 is here – 5 things you need to know about OpenAI’s ‘most useful’ model yet
I compared ChatGPT-5 Pro with ChatGPT-5 – and there’s no doubt about the winner
I compared ChatGPT 5’s three model options, and the results explain why people miss GPT‑4o
ChatGPT users are still fuming about GPT-5’s downgrades – here are the 4 biggest complaints
OpenAI launches GPT-5-Codex with a 74.5% success rate on real world coding
‘We hear you’ – OpenAI’s Sam Altman responds to user concerns at a GPT-5 AMA
Sam Altman may have had bigger plans, but most people are just using ChatGPT as a search engine
You don’t have to explain everything to Claude anymore – it’s finally in your apps
I tested ChatGPT-5 vs ChatGPT-4o with 5 prompts – and there’s a clear winner
OpenAI GPT-5 launch live – all the latest news as Sam Altman unveils the new model
AI Platforms & Assistants
New tests show ChatGPT-5 is more accurate than GPT-4o – Grok still struggles with hallucinations
AI Platforms & Assistants
OpenAI boasts about the power of ChatGPT 5, but does that make it better than Gemini 2.5 Flash?
AI Platforms & Assistants
Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study
Graham Barlow
29 September 2025
OpenAI is measuring how AI really performs
When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.
(Image credit: Shutterstock/ gguy)
OpenAI has released GDPval, a new evaluation system to test how AI performs at work-related tasks
Claude Opus 4.1 comes out in the lead, with ‘ChatGPT-5 high’ in second place
Tasks include things like emailing a response to a dissatisfied customer
We’re all familiar with AI benchmarks, which measure performance at certain tasks, but often these tasks don’t reflect the real world and how people actually use AI, especially at work.
To combat this problem, OpenAI, the maker of ChatGPT, is introducing GDPval, a new way of measuring AI model performance using real-world work tasks compared to a real human across 44 occupations, from software developers and lawyers to registered nurses and mechanical engineers.
Surprisingly, the OpenAI study shows that the best performing model was Anthropic’s Claude Opus 4.1, which outpaced not only OpenAI’s GPT-5 but also Gemini and Grok.
You may like
Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT
OpenAI says it has proof its tools are making workers more productive
GPT-5 vs. Claude AI – The Battle of Explaining Cold Fusion Simply
GDPval win rate
(Image credit: OpenAI)
This graph shows the overall GDPval win rate (the times when the AI did better than an industry expert) and shows that Claude Opus 4.1 is out in the lead with a win rate of 47.6, with ‘ChatGPT-5 high’ coming second with 38.8 and ‘ChatGPT o3 high’ at 34.1. ChatGPT-4o scores the lowest, with a win rate of 12.4, which is significantly behind both Grok 4 and Gemini 2.5 Pro.
The study found that Claude was the highest-performing across eight of the nine industry sectors it tested, including government, health care, and social assistance. The results clearly show that Claude Opus 4.1 leads across a diverse range of work-related tasks.
(Image credit: OpenAI)
Examples of the tasks include things like emailing a response to a dissatisfied customer requesting a return, optimizing a table layout for a Spring vendor fair, and auditing price inconsistencies in purchase orders.
What’s in a name?
The name used by OpenAI, GDPval, comes from the concept of Gross Domestic Product (GDP) as a key economic indicator. OpenAI wants GPDval to be widely adopted to help ground conversations about future AI improvements in evidence rather than guesswork.
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Contact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsBy submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.
Releasing the results showing a competitor out in front appears to be an exercise in radical transparency by OpenAI, but that fits in perfectly with the company’s philosophy. “Our mission is to ensure that artificial general intelligence benefits all of humanity. As part of our mission, we want to transparently communicate progress on how AI models can help people in the real world”, reads a statement from OpenAI.
The paper, which is available to read in its entirety online, comes a week after OpenAI released a more consumer-focused paper that showed that the majority of ChatGPT users (70%) were actually using it at home, rather than at work.
The study was conducted by OpenAI’s Economic Research team and Harvard economist David Deming for the National Bureau of Economic Research (NBER). The results were surprising to a lot of people, as previously, the focus of new ChatGPT releases has been very focused on work-related tasks like coding, making presentations, and being a good research tool.
The news that Claude Opus 4.1 is better at actual work-related tasks, not just benchmarks, than even ‘ChatGPT-5 high’ could mean a renewed focus by OpenAI towards its changing user base.
You might also like
OpenAI responds to furious ChatGPT subscribers who accuse it of secretly switching to inferior models
OpenAI reveals how people use ChatGPT, and the results might surprise you
ChatGPT’s new Pulse feature will help you manage your day with handy visual updates
Graham Barlow
Social Links Navigation
Senior Editor, AI
Graham is the Senior Editor for AI at TechRadar. With over 25 years of experience in both online and print journalism, Graham has worked for various market-leading tech brands including Computeractive, PC Pro, iMore, MacFormat, Mac|Life, Maximum PC, and more. He specializes in reporting on everything to do with AI and has appeared on BBC TV shows like BBC One Breakfast and on Radio 4 commenting on the latest trends in tech. Graham has an honors degree in Computer Science and spends his spare time podcasting and blogging.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.
Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT
OpenAI says it has proof its tools are making workers more productive
GPT-5 vs. Claude AI – The Battle of Explaining Cold Fusion Simply
GPT-5 is here – 5 things you need to know about OpenAI’s ‘most useful’ model yet
I compared ChatGPT-5 Pro with ChatGPT-5 – and there’s no doubt about the winner
I compared ChatGPT 5’s three model options, and the results explain why people miss GPT‑4o
Latest in Claude
Anthropic’s CEO gives ‘a 25% chance things go really, really badly’ with AI
You have to pay Claude to remember you, but the AI will forget your conversations for free
Plaud NotePin review
Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT
Anthropic will nuke your attempt to use AI to build a nuke
Has ChatGPT-5’s cold tone made you want to try alternative AIs? Claude just added a new memory feature
Latest in News
Harrods cyberattack – over 430,000 customers have data stolen, here’s how to stay safe
Now AI is everywhere in businesses, is anyone actually using it?
Resident Evil Requiem is only coming to the Switch 2 because of how well Village runs – ‘it looked really great’ says developer
It’s official – NordVPN’s Meshnet is “not going anywhere”
Sandisk’s Asus ROG Xbox Ally storage solutions add up to 4TB of space to the upcoming handhelds
Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study
LATEST ARTICLES
It’s official – NordVPN’s Meshnet is “not going anywhere”
Now AI is everywhere in businesses, is anyone actually using it?
Harrods cyberattack – over 430,000 customers have data stolen, here’s how to stay safe
Meta launches Vibes, a new way of creating and remixing AI videos
Sandisk’s Asus ROG Xbox Ally storage solutions add up to 4TB of space to the upcoming handhelds
TechRadar is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site.
Contact Future’s experts
Terms and conditions
Privacy policy
Cookies policy
Advertise with us
Web notifications
Accessibility Statement
Future US, Inc. Full 7th Floor, 130 West 42nd Street,
Please login or signup to comment
Please wait…