In computer science, NP refers to problems where a solution can be verified quickly, even if finding that solution may take enormous time.
This class has guided much of modern complexity theory. Its quantum counterpart is QMA, where a proof comes not as a string of bits but as a fragile quantum state.
Researchers now say OpenAI’s GPT-5 has helped prove strict limits on QMA. The model suggested a mathematical expression that led to a breakthrough on how far error reduction can go.
The study may be one of the first times artificial intelligence has supplied a concrete step in quantum complexity research.
Setting boundaries in QMA
Scott Aaronson of the University of Texas at Austin and Freek Witteveen of CWI Amsterdam wrote the new paper, Limits to black-box amplification in QMA, now on arXiv.
Their work builds on Stacey Jeffery and Witteveen’s 2025 result and extends Aaronson’s 2008 oracle separation.
In QMA, a prover called Merlin sends a quantum witness to a verifier called Arthur. Arthur runs a quantum algorithm to decide whether to accept. Two numbers define these systems. Completeness is the chance that Arthur accepts a valid proof. Soundness is the chance he wrongly accepts a false one.
Amplification methods can shrink error by repeating the test and combining results. Jeffery and Witteveen showed that completeness can reach doubly exponential proximity to one. The open question was whether it could go beyond that.
GPT-5’s key contribution
Aaronson struggled with the analysis and turned to GPT-5. The model’s first suggestions were wrong. After back-and-forth, it proposed reframing the problem with a single function that measured how close acceptance came to certainty.
That idea proved decisive. Using approximation theory, the researchers showed that completeness cannot surpass doubly exponential closeness to one, and soundness cannot drop below exponentially small.
Aaronson wrote on his blog Shtetl Optimized: “Now, in September 2025, I’m here to tell you that AI has finally come for what my experience tells me is the most quintessentially human of all human intellectual activities: namely, proving oracle separations between quantum complexity classes.”
The proof shows black-box amplification has hit its ceiling. Completeness can go no further than doubly exponential, and soundness can go no lower than exponential.
The result confirms that resolving whether QMA equals QMA1 will need nonrelativizing methods, which analyze circuit structures rather than treating them as black boxes. The asymmetry is also clear: completeness depends on a single good witness, while soundness must hold against every possible witness.
Some critics said GPT-5’s insight was obvious. Aaronson responded: “GPT5-Thinking’s suggestion of a function to use ‘should have’ been obvious to us. It would have been obvious to us had we known more, or had we spent more time studying the literature or asking experts.”
The study leaves big questions open, including whether QMA equals QMA1. But it marks a turning point: AI is no longer just drafting papers or writing code.
In this case, it helped close a decades-long gap in one of the most abstract areas of computer science.