OpenAI researcher Sebastian Bubeck and Ernest Ryu: How AI Accelerates Mathematical Research and Moves Towards Universal Science

CN
2 hours ago

Written by: Techub News Organized

In the latest episode of the OpenAI podcast, OpenAI researchers Sebastian Bubeck and Ernest Ryu engaged in an in-depth conversation with host Andrew Mayne. Both have strong mathematical backgrounds (Bubeck was a professor at Princeton University, while Ryu taught at the UCLA mathematics department), and now they are exploring the intersection of AI and mathematics at OpenAI's cutting edge. They shared the "miraculous" advancements in AI's mathematical capabilities over the past few years, breakthrough moments from their own experiences, and what this means for the entire scientific community.

From "Unable to Calculate" to Solving Olympiad Problems: The Leap in AI's Mathematical Abilities

Just two years ago, large language models (LLMs) were almost insulated from "mathematical capability." Sebastian Bubeck recalled that at that time, there was not even a concept of "reasoning models," let alone proving difficult mathematical theorems. However, today's models can assist Fields Medal winners in their daily work, and this leap is "simply astonishing."

Ernest Ryu provided a more specific timeline of perceived changes. When ChatGPT first emerged (early 2023), he began testing the model's ability to solve everyday math problems, such as sharing complex bills after camping with three people or scheduling Zoom meetings for people in different time zones. At that time, the model could not handle these types of problems at all. "But around mid-2025, things suddenly changed," Ryu said. He was not an OpenAI employee at the time and was unsure of what exactly happened internally, but the model suddenly began to solve problems at the International Mathematical Olympiad (IMO) level and started to touch upon research-level mathematics.

Bubeck looked back to an earlier point in time, mentioning the mathematical model Minerva that Google launched four years ago (before ChatGPT was released). "I was so impressed I almost fell out of my chair," he said, because that model could actually provide the equation of a line passing through points given their coordinates on a plane. "It’s almost hard to understand now—what's so difficult about this? The model can certainly do it." He lamented that we seem to have forgotten just how quickly progress has come.

Today, Ryu calibrates AI's mathematical abilities as follows: Unless you are a professional mathematician trying to invent new mathematics, ChatGPT can already handle all the math problems you need. Whether physicists, chemists, or researchers in other STEM fields, as long as they use existing complex mathematical tools like differential equations and differential geometry, AI can provide help. Of course, users still need to exercise some caution, check, and verify, but AI has covered the mathematical needs of 99% of the population.

Firsthand Experience: Using ChatGPT to Solve a 42-Year-Old Open Case

Theoretical advances are exciting, but firsthand experience is more convincing. Ernest Ryu shared his personal experience of using ChatGPT to solve a 42-year-old open problem in optimization theory.

The problem revolves around the famous "Nesterov Accelerated Gradient Method." It is known in academia that this algorithm performs well in most cases (converges), but it has never been confirmed whether, in extremely bad cases, it can diverge. This is a genuinely open question.

Ryu decided to give ChatGPT a try. He didn't simply input the question and wait for an answer; instead, he took on the roles of "validator" and "guide." Over three nights (about 4 hours each night, totaling 12 hours), he engaged in intensive interaction with ChatGPT concerning this question. Whenever the model made a mistake, he corrected it; at the same time, he tried to guide the conversation towards research paths he found novel.

Ultimately, ChatGPT generated a proof. Ryu carefully checked it and even had ChatGPT double-check to confirm the proof was correct. "And just like that, this 42-year-old open problem was solved." To announce this result in the most interesting way, he chose not to write a formal paper but shared the experience on Twitter (now X), which sparked widespread attention and discussion. This may be one of the earliest cases of a truly mathematical open problem being solved with AI assistance.

Bubeck added that this "professor-student" interaction model significantly compressed the research timeline. Without AI, solving the same problem could take months or even longer. AI has reduced that process to mere hours.

Beyond Literature Search: AI Begins to Produce New Mathematical Results

Many of the early successful cases of AI in mathematics involved "deep literature search." For example, GPT once scanned thousands of papers to find the answer to a specific Erdős problem in an entirely unrelated mathematical field and completed the reasoning work that connected the two parts. This was already quite impressive.

Subsequently, the OpenAI team began to systematically test models on the list of Erdős problems. Bubeck recalled that team member Mark Selke tried to have the model solve all the problems on the list, resulting in the model providing "solutions" for 10 of the problems. Bubeck shared this outcome on Twitter, which sparked some misunderstandings and controversies (including debates with Google DeepMind co-founder Demis Hassabis) because people mistakenly believed these 10 solutions were completely original and unprecedented.

"But the current outcome is even more astonishing," Bubeck said, "Months later, we actually have more than 10 completely novel solutions that can be published in top combinatorics journals, some derived from ChatGPT and others from our internal models." This clearly demonstrates the acceleration of progress: from finding answers through literature to genuinely producing new mathematical insights, all within a few months.

This raises a deeper question: Is scientific progress really the reorganization and reasoning of different knowledge, or does it require human genius's "moment of inspiration"? Bubeck believes there is no definitive answer yet, but AI is reorganizing and reasoning at an unprecedented scale, which itself may infinitely expand human knowledge.

Why is Mathematics a Key Benchmark for AGI?

Why is OpenAI so focused on AI's mathematical capabilities? Sebastian Bubeck provided two core reasons.

First, mathematical problems are clear and unambiguous, and everyone has a consensus on the correctness of the problem itself and its answers (below the research level). This makes it a perfect benchmark for measuring model progress. Over the past four years, mathematics has played this "yardstick" role very well.

Second and more importantly, mathematics requires long-term and coherent thinking. Solving a mathematical problem can take days, weeks, or even years. If there is even one mistake in the entire reasoning chain, the entire argument collapses. This characteristic aligns perfectly with our expectations of "reasoning models": they need to be able to think coherently for long periods and self-correct when they make mistakes.

"We hope to generalize this property obtained through mathematics to other fields," Bubeck said, "By the way, this is entirely the same reason why humans train mathematical thinking." Mathematics cultivates the exact rigorous and logical thinking skills that are crucial for building AGI capable of making complex scientific discoveries.

He proposed the concept of "AGI time": the length of time AI can simulate human thought. Two years ago, models might be able to simulate a high school student thinking about a problem for a few minutes; now, they can simulate a researcher thinking for hours or even days. "We hope to advance to weeks or even months. This is open research; I think no one on Earth knows exactly how to do it… but we are moving towards automated researchers."

Ernest Ryu added from a technical perspective on the possible paths to achieving "long thinking." Currently, people usually interact with ChatGPT within a limited context window (approximately the length of 50 pages of a mathematics paper), which is insufficient to generate true deep mathematical breakthroughs. However, referencing Codex's ability to handle very long codebases, future LLMs could also manage ultra-long "mathematical notes," compressing and summarizing dialogues to sustain thought across time scales of weeks or months, ultimately leading to papers that crystallize the results of long-term thinking.

How Will AI Reshape Mathematics and Science?

The two researchers painted a picture of a future where AI is deeply integrated into mathematics and broader scientific research:

  • Interconnected and Accelerated Mathematics: Ryu pointed out that much of cutting-edge mathematical research is quite niche, and a paper may be of interest to only five living people. But AI will read and remember all papers. In the future, a result that has languished for 20 years, going unnoticed, might be rediscovered and applied to a completely new field in 100 years. Mathematics will become a much more interconnected whole. Meanwhile, AI can greatly accelerate the verification process for mathematics. Currently, verifying a 300-page important proof may take years and can still go wrong. In the future, AI could quickly perform preliminary validations and error tagging, allowing human experts to make final judgments based on this, greatly increasing the credibility and iteration speed of mathematical results.
  • Empowering All Scientists: Bubeck emphasized that OpenAI's training techniques are universal, and advancements in mathematics herald similar breakthroughs across all scientific fields. Mathematicians can easily write code for experiments with Codex, while scientists in other fields can leverage ChatGPT to use more advanced mathematical tools. The process of scientific discovery will be "literally accelerated."
  • The Evolution of Human Roles, Not Their Disappearance: Bubeck rationally predicts that within a year or two, models may be able to complete most of the foundational work done by human researchers. But this does not mean scientists are no longer needed. On the contrary, expertise is more valuable than ever. It is their deep field knowledge that will guide AI to make true breakthroughs. The purpose of science is to understand and solve problems (such as curing diseases), and AI itself does not care about these goals; it requires humans to set the direction and maintain control.
  • Caution Against the Risk of "Cognitive Atrophy": Bubeck also expressed concerns about over-reliance on tools. If humans simply have AI explain everything without undergoing the "patience of days or even weeks of in-depth study" required to truly understand a result, it might lead to a superficial understanding of knowledge. He warned that there have already been examples of non-professional researchers using AI to generate dozens of pages of erroneous "proofs." "We need better scientists than ever," he urged the academic community to understand this speed of progress and reposition their roles in the process.

Advice for Mathematics Enthusiasts: Start with Chatting

For those interested in mathematics but who may feel they are not "math people," the two researchers offered simple yet powerful advice: Chat with ChatGPT.

Ryu shared his own learning approach: he used to check Wikipedia, but the content was often too dense; now he directly asks ChatGPT and can follow up with questions targeting his knowledge gaps. You can introduce your mathematical background and the books you’ve read and then let it propose a question that is understandable at your knowledge level, possibly an open-ended question. Then, you could explore solutions together, continually derivating new questions and variants. "This makes (mathematical research) feel less lonely, as mathematics is inherently a social endeavor." Bubeck also specifically pointed out that people have not yet fully realized the ability of LLMs to ask good questions, which is precisely the beginning of exploration.

Host Andrew Mayne offered a more relaxed starting point: begin with fun estimation questions like, "How many M&M’s can fit in your bathtub?" or "How many words did you read last year?" and initiate a dialogue with the AI. The next step may lead you, unknowingly, into a more complex mathematical world, beginning to understand how it will impact you.

Mathematics is becoming more interesting, interconnected, and powerful than ever, and human researchers, equipped with AI, will stand at the center of this new era.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink