Business

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study

By Graham Barlow

Copyright techradar

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study

Skip to main content

Tech Radar Pro

Tech Radar Gaming

Close main menu

the business technology experts

België (Nederlands)

Deutschland

North America

US (English)

Australasia

New Zealand

View Profile

Search TechRadar

Best web hosting
Best office chairs
Best website builder
Best antivirus
Expert Insights

Don’t miss these

Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT

OpenAI says it has proof its tools are making workers more productive

AI Platforms & Assistants
GPT-5 vs. Claude AI – The Battle of Explaining Cold Fusion Simply

GPT-5 is here – 5 things you need to know about OpenAI’s ‘most useful’ model yet

I compared ChatGPT-5 Pro with ChatGPT-5 – and there’s no doubt about the winner

I compared ChatGPT 5’s three model options, and the results explain why people miss GPT‑4o

ChatGPT users are still fuming about GPT-5’s downgrades – here are the 4 biggest complaints

OpenAI launches GPT-5-Codex with a 74.5% success rate on real world coding

‘We hear you’ – OpenAI’s Sam Altman responds to user concerns at a GPT-5 AMA

Sam Altman may have had bigger plans, but most people are just using ChatGPT as a search engine

You don’t have to explain everything to Claude anymore – it’s finally in your apps

I tested ChatGPT-5 vs ChatGPT-4o with 5 prompts – and there’s a clear winner

OpenAI GPT-5 launch live – all the latest news as Sam Altman unveils the new model

AI Platforms & Assistants
New tests show ChatGPT-5 is more accurate than GPT-4o – Grok still struggles with hallucinations

AI Platforms & Assistants
OpenAI boasts about the power of ChatGPT 5, but does that make it better than Gemini 2.5 Flash?

AI Platforms & Assistants

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study

Graham Barlow

29 September 2025

OpenAI is measuring how AI really performs

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

(Image credit: Shutterstock/ gguy)

OpenAI has released GDPval, a new evaluation system to test how AI performs at work-related tasks
Claude Opus 4.1 comes out in the lead, with ‘ChatGPT-5 high’ in second place
Tasks include things like emailing a response to a dissatisfied customer

We’re all familiar with AI benchmarks, which measure performance at certain tasks, but often these tasks don’t reflect the real world and how people actually use AI, especially at work.

To combat this problem, OpenAI, the maker of ChatGPT, is introducing GDPval, a new way of measuring AI model performance using real-world work tasks compared to a real human across 44 occupations, from software developers and lawyers to registered nurses and mechanical engineers.
Surprisingly, the OpenAI study shows that the best performing model was Anthropic’s Claude Opus 4.1, which outpaced not only OpenAI’s GPT-5 but also Gemini and Grok.

You may like

Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT

OpenAI says it has proof its tools are making workers more productive

GPT-5 vs. Claude AI – The Battle of Explaining Cold Fusion Simply

GDPval win rate

(Image credit: OpenAI)
This graph shows the overall GDPval win rate (the times when the AI did better than an industry expert) and shows that Claude Opus 4.1 is out in the lead with a win rate of 47.6, with ‘ChatGPT-5 high’ coming second with 38.8 and ‘ChatGPT o3 high’ at 34.1. ChatGPT-4o scores the lowest, with a win rate of 12.4, which is significantly behind both Grok 4 and Gemini 2.5 Pro.

The study found that Claude was the highest-performing across eight of the nine industry sectors it tested, including government, health care, and social assistance. The results clearly show that Claude Opus 4.1 leads across a diverse range of work-related tasks.

(Image credit: OpenAI)
Examples of the tasks include things like emailing a response to a dissatisfied customer requesting a return, optimizing a table layout for a Spring vendor fair, and auditing price inconsistencies in purchase orders.
What’s in a name?
The name used by OpenAI, GDPval, comes from the concept of Gross Domestic Product (GDP) as a key economic indicator. OpenAI wants GPDval to be widely adopted to help ground conversations about future AI improvements in evidence rather than guesswork.

Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Contact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsBy submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.
Releasing the results showing a competitor out in front appears to be an exercise in radical transparency by OpenAI, but that fits in perfectly with the company’s philosophy. “Our mission is to ensure that artificial general intelligence benefits all of humanity. As part of our mission, we want to transparently communicate progress on how AI models can help people in the real world”, reads a statement from OpenAI.
The paper, which is available to read in its entirety online, comes a week after OpenAI released a more consumer-focused paper that showed that the majority of ChatGPT users (70%) were actually using it at home, rather than at work.
The study was conducted by OpenAI’s Economic Research team and Harvard economist David Deming for the National Bureau of Economic Research (NBER). The results were surprising to a lot of people, as previously, the focus of new ChatGPT releases has been very focused on work-related tasks like coding, making presentations, and being a good research tool.
The news that Claude Opus 4.1 is better at actual work-related tasks, not just benchmarks, than even ‘ChatGPT-5 high’ could mean a renewed focus by OpenAI towards its changing user base.
You might also like

OpenAI responds to furious ChatGPT subscribers who accuse it of secretly switching to inferior models
OpenAI reveals how people use ChatGPT, and the results might surprise you
ChatGPT’s new Pulse feature will help you manage your day with handy visual updates

Graham Barlow

Social Links Navigation
Senior Editor, AI

Graham is the Senior Editor for AI at TechRadar. With over 25 years of experience in both online and print journalism, Graham has worked for various market-leading tech brands including Computeractive, PC Pro, iMore, MacFormat, Mac|Life, Maximum PC, and more. He specializes in reporting on everything to do with AI and has appeared on BBC TV shows like BBC One Breakfast and on Radio 4 commenting on the latest trends in tech. Graham has an honors degree in Computer Science and spends his spare time podcasting and blogging.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT

OpenAI says it has proof its tools are making workers more productive

GPT-5 vs. Claude AI – The Battle of Explaining Cold Fusion Simply

GPT-5 is here – 5 things you need to know about OpenAI’s ‘most useful’ model yet

I compared ChatGPT-5 Pro with ChatGPT-5 – and there’s no doubt about the winner

I compared ChatGPT 5’s three model options, and the results explain why people miss GPT‑4o

Latest in Claude

Anthropic’s CEO gives ‘a 25% chance things go really, really badly’ with AI

You have to pay Claude to remember you, but the AI will forget your conversations for free

Plaud NotePin review

Claude AI just became the ultimate work companion, and it might tempt me to switch from ChatGPT

Anthropic will nuke your attempt to use AI to build a nuke

Has ChatGPT-5’s cold tone made you want to try alternative AIs? Claude just added a new memory feature

Latest in News

Harrods cyberattack – over 430,000 customers have data stolen, here’s how to stay safe

Now AI is everywhere in businesses, is anyone actually using it?

Resident Evil Requiem is only coming to the Switch 2 because of how well Village runs – ‘it looked really great’ says developer

It’s official – NordVPN’s Meshnet is “not going anywhere”

Sandisk’s Asus ROG Xbox Ally storage solutions add up to 4TB of space to the upcoming handhelds

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAI’s own study

LATEST ARTICLES

It’s official – NordVPN’s Meshnet is “not going anywhere”

Now AI is everywhere in businesses, is anyone actually using it?

Harrods cyberattack – over 430,000 customers have data stolen, here’s how to stay safe

Meta launches Vibes, a new way of creating and remixing AI videos

Sandisk’s Asus ROG Xbox Ally storage solutions add up to 4TB of space to the upcoming handhelds

TechRadar is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site.

Contact Future’s experts

Terms and conditions

Privacy policy

Cookies policy

Advertise with us

Web notifications

Accessibility Statement

Future US, Inc. Full 7th Floor, 130 West 42nd Street,

Please login or signup to comment

Please wait…