Models from OpenAI and DeepMind achieved gold-medal scores in the International Mathematical Olympiad. Credit: MoiraM/Alamy
Google DeepMind announced on 21 July that its software had cracked a set of maths problems at the level of the world’s top secondary-school students, achieving a gold-medal score on questions from the International Mathematical Olympiad. At first sight, this marked only a marginal improvement over the previous year’s performance. The company’s system performed in the upper range of silver-medal standard at the 2024 Olympiad, whereas this year it was evaluated in the lower range for a human gold medallist.
But the grades this year hide a "big paradigm shift", says Thang Luong, a computer scientist at DeepMind in Mountain View, California. The company achieved its previous feats using two artificial intelligence (AI) tools specifically designed to carry out rigorous logical steps in mathematical proofs, called AlphaGeometry 2 and AlphaProof. The process required human experts to first translate the problems' statements into something similar to a programming language, and then translate the AI’s solutions back into English.

"This year, everything is natural language, end to end," says Luong. The team used a large language model (LLM) called Deep Think, which is based on its Gemini system but with some extra developments that make it better and faster at producing mathematical arguments, such as handling multiple chains of thought in parallel. "For a long time, I didn’t think we could go that far with LLMs," Luong adds.
Deep Think scored 35 out of 42 points on the 6 problems that were given to participants in this year’s Olympiad. Under an agreement with the organizers, the computer’s solutions were marked by the same judges who evaluated the human participants.
Separately, ChatGPT creator OpenAI, based in San Francisco, California, saw an LLM of its own solve the same Olympiad problems at gold-medal level, but its solutions were evaluated independently.
Impressive performance
For years, many AI researchers have fallen in one of two camps. Until 2012, the leading approach was to code the rules of logical thinking into the machine by hand. Since then, neural networks — which train automatically by learning from vast troves of data — have made a series of sensational breakthroughs, and tools such as ChatGPT have entered mainstream use.
Gary Marcus, a neuroscientist at New York University (NYU) in New York City, called the results by DeepMind and OpenAI "awfully impressive". Marcus is an advocate of the ‘coding logic by hand' approach — also known as neurosymbolic AI — and a frequent critic of what he sees as hype surrounding LLMs. hype surrounding LLMs with NYU computer scientist Ernest Davis, he commented that "to be able to solve math problems at the level of the top 67 high school students in the world is to have really good math problem solving chops".
Deep Think scored 35 out of 42 points on the 6 problems that were given to participants in this year’s Olympiad. Under an agreement with the organizers, the computer’s solutions were marked by the same judges who evaluated the human participants.
Separately, ChatGPT creator OpenAI, based in San Francisco, California, saw an LLM of its own solve the same Olympiad problems at gold-medal level, but its solutions were evaluated independently.
Impressive performance
For years, many AI researchers have fallen in one of two camps. Until 2012, the leading approach was to code the rules of logical thinking into the machine by hand. Since then, neural networks — which train automatically by learning from vast troves of data — have made a series of sensational breakthroughs, and tools such as ChatGPT have entered mainstream use.
Gary Marcus, a neuroscientist at New York University (NYU) in New York City, called the results by DeepMind and OpenAI "awfully impressive". Marcus is an advocate of the ‘coding logic by hand' approach — also known as neurosymbolic AI — and a frequent critic of what he sees as hype surrounding LLMs. hype surrounding LLMs with NYU computer scientist Ernest Davis, he commented that "to be able to solve math problems at the level of the top 67 high school students in the world is to have really good math problem solving chops".

It remains to be seen whether LLMs' superiority on Olympiad problems is here to stay, or if neurosymbolic AI will claw its way back to the top. "At this point, the two camps still keep developing," says Luong, who works on both approaches. "They could converge together."
His team has already experimented with using LLMs to automate the translation of mathematical statements from natural language into the formal system that AlphaGeometry 2 can read.
Systems such as AlphaProof also have the advantage that they can certify the correctness of their own proofs, whereas proofs written by LLMs have to be checked by humans, the way human-written maths papers are. Many mathematicians have been working on translating human-written proofs into a machine-readable language so that computers can check them.
Ready for research?
Mathematician Kevin Buzzard at Imperial College London wrote on the social-media platform Zulip that Olympiad success does not necessarily mean that a young mathematician is ready to do advanced research. By the same token, he said, it is an "open question" whether these systems' gold-medal performances will translate into an ability to tackle complex research questions.
His team has already experimented with using LLMs to automate the translation of mathematical statements from natural language into the formal system that AlphaGeometry 2 can read.
Systems such as AlphaProof also have the advantage that they can certify the correctness of their own proofs, whereas proofs written by LLMs have to be checked by humans, the way human-written maths papers are. Many mathematicians have been working on translating human-written proofs into a machine-readable language so that computers can check them.
Ready for research?
Mathematician Kevin Buzzard at Imperial College London wrote on the social-media platform Zulip that Olympiad success does not necessarily mean that a young mathematician is ready to do advanced research. By the same token, he said, it is an "open question" whether these systems' gold-medal performances will translate into an ability to tackle complex research questions.

Ken Ono, a mathematician at the University of Virginia in Charlottesville, agrees. "I view AI as valuable research partners, providing quick access to scientific literature and data summaries, as well as offering effective strategies for surprisingly difficult problems," he says. But he adds that "these tests and benchmarks aren’t aligned with what theoretical mathematicians do".
DeepMind says it will in the future allow some researchers to work with a version of Deep Think. "Very soon we can have AI collaborating with mathematicians," says Luong.
doi: https://doi.org/10.1038/d41586-025-02343-x
Additional reporting by Elizabeth Gibney
DeepMind says it will in the future allow some researchers to work with a version of Deep Think. "Very soon we can have AI collaborating with mathematicians," says Luong.
doi: https://doi.org/10.1038/d41586-025-02343-x
Additional reporting by Elizabeth Gibney