OpenAI's Agent Ambition: From Math Skills to General AI

Techcrunch

When Hunter Lightman joined OpenAI as a researcher in 2022, he witnessed the rapid ascent of ChatGPT, one of the fastest-growing products in history. Meanwhile, Lightman was part of a team, known as MathGen, quietly working on a foundational challenge: teaching OpenAI’s models to excel at high school math competitions. This effort would prove instrumental to OpenAI’s industry-leading pursuit of AI reasoning models – the core technology required for AI agents that can perform complex computer tasks much like a human.

“We were trying to make the models better at mathematical reasoning, which at the time they weren’t very good at,” Lightman explained, reflecting on MathGen’s early work. While OpenAI’s current AI systems still face challenges like “hallucinations” and struggles with highly complex tasks, their mathematical reasoning capabilities have advanced significantly. One of OpenAI’s models recently earned a gold medal at the International Math Olympiad, a prestigious competition for top high school students. OpenAI believes these enhanced reasoning abilities will translate across various domains, ultimately paving the way for the general-purpose agents the company has long envisioned.

Unlike ChatGPT, which emerged as a “happy accident” from a low-key research preview into a viral consumer product, OpenAI’s development of AI agents has been a deliberate, multi-year endeavor. As OpenAI CEO Sam Altman stated at the company’s first developer conference in 2023, “Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you. These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.”

Whether agents will fully realize Altman’s ambitious vision remains to be seen. However, OpenAI made a significant impact with the release of its first AI reasoning model, o1, in the fall of 2024. Less than a year later, the 21 foundational researchers behind this breakthrough have become some of Silicon Valley’s most sought-after talent. Notably, Mark Zuckerberg recruited five of the o1 researchers for Meta’s new superintelligence-focused unit, offering compensation packages exceeding $100 million. One of them, Shengjia Zhao, was recently appointed chief scientist of Meta Superintelligence Labs.

The rise of OpenAI’s reasoning models and agents is deeply connected to a machine learning training technique known as reinforcement learning (RL). RL provides AI models with feedback on the correctness of their choices within simulated environments. This technique has been in use for decades, famously demonstrated in 2016 when Google DeepMind’s AlphaGo, an AI system trained with RL, garnered global attention by defeating a world champion in the board game Go. Around the time of AlphaGo’s triumph, Andrej Karpathy, one of OpenAI’s first employees, began exploring how RL could be leveraged to create an AI agent capable of using a computer. However, it would take years for OpenAI to develop the necessary models and training techniques.

By 2018, OpenAI had pioneered its first large language model in the GPT series, pre-trained on vast internet data. While GPT models excelled at text processing, eventually leading to ChatGPT, they initially struggled with basic mathematics. A significant breakthrough occurred in 2023, internally dubbed “Q*” and later “Strawberry.” This involved combining large language models (LLMs), reinforcement learning, and a technique called test-time computation. The latter provided the models with additional time and processing power to plan and work through problems, verifying their steps before providing an answer. This innovation also enabled a new approach called “chain-of-thought” (CoT), which dramatically improved AI performance on unfamiliar math questions.

“I could see the model starting to reason,” noted El Kishky, a researcher. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.” While the individual techniques weren’t entirely novel, OpenAI’s unique combination of them led directly to Strawberry, which in turn paved the way for o1. The company quickly recognized that the planning and fact-checking abilities inherent in these AI reasoning models could be invaluable for powering AI agents. “We had solved a problem that I had been banging my head against for a couple of years,” Lightman recounted, describing it as one of the most exciting moments of his research career.

With the advent of AI reasoning models, OpenAI identified two new avenues for improving AI: applying more computational power during post-training and giving models more time and processing power when generating an answer. “OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,” Lightman explained. Following the 2023 Strawberry breakthrough, OpenAI established an “Agents” team, led by researcher Daniel Selsam, to advance this new paradigm. This team’s work eventually integrated into the larger o1 reasoning model project, with key leaders including OpenAI co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jakub Pachocki.

Developing o1 required diverting precious resources, primarily talent and GPUs. Throughout OpenAI’s history, researchers have often had to negotiate for resources, and demonstrating breakthroughs was a proven method to secure them. “One of the core components of OpenAI is that everything in research is bottom up,” said Lightman. “When we showed the evidence [for o1], the company was like, ‘This makes sense, let’s push on it.’” Some former employees suggest that the startup’s overarching mission to develop Artificial General Intelligence (AGI) was a key factor in achieving breakthroughs in AI reasoning models. By prioritizing the development of the smartest possible AI models over immediate productization, OpenAI was able to invest heavily in o1, a luxury not always afforded at competing AI labs. This decision to embrace new training methods proved prescient, as by late 2024, several leading AI labs began observing diminishing returns from models created through traditional pre-training scaling. Today, much of the AI field’s momentum stems from advances in reasoning models.

The concept of an AI “reasoning” raises philosophical questions. In many ways, the ultimate goal of AI research is to emulate human intelligence. Since o1’s launch, ChatGPT’s user experience has incorporated more human-sounding features like “thinking” and “reasoning.” When asked if OpenAI’s models truly reason, El Kishky offered a computer science perspective: “We’re teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning.” Lightman focuses on the models’ results rather than drawing direct parallels to human brains. “If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that,” he said. “We can call it reasoning, because it looks like these reasoning traces, but it’s all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.” While OpenAI’s researchers acknowledge potential disagreements over their definitions of reasoning – and indeed, critics have emerged – they contend that the models’ capabilities are paramount. Other AI researchers tend to concur. Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning models to airplanes, noting that both are human-made systems inspired by nature (human reasoning and bird flight, respectively) but operate through entirely different mechanisms. This doesn’t diminish their utility or their ability to achieve similar outcomes. A recent position paper by AI researchers from OpenAI, Anthropic, and Google DeepMind collectively agreed that AI reasoning models are not yet fully understood, necessitating further research. It may be too early to definitively state what occurs inside them.

Currently, AI agents on the market perform best in well-defined, verifiable domains such as coding. OpenAI’s Codex agent assists software engineers with simple coding tasks, while Anthropic’s models have gained popularity in AI coding tools like Cursor and Claude Code, becoming some of the first AI agents users are willing to pay for. However, general-purpose AI agents, such as OpenAI’s ChatGPT Agent and Perplexity’s Comet, still struggle with many complex, subjective tasks that people wish to automate. Attempts to use these tools for online shopping or finding long-term parking often result in extended processing times and “silly mistakes.”

These early agent systems are undoubtedly set to improve. However, researchers must first solve how to better train the underlying models to complete more subjective tasks. “Like many problems in machine learning, it’s a data problem,” Lightman commented on the limitations of agents in subjective domains. “Some of the research I’m really excited about right now is figuring out how to train on less verifiable tasks. We have some leads on how to do these things.” Noam Brown, an OpenAI researcher who contributed to both the IMO model and o1, explained that OpenAI possesses new general-purpose RL techniques that enable them to teach AI models skills not easily verified. This approach was key to building the model that secured a gold medal at the IMO. OpenAI’s IMO model, a newer AI system, spawns multiple agents that simultaneously explore various ideas before selecting the optimal answer. This multi-agent approach is gaining traction, with Google and xAI recently releasing state-of-the-art models employing similar techniques. “I think these models will become more capable at math, and I think they’ll get more capable in other reasoning areas as well,” Brown stated. “The progress has been incredibly fast. I don’t see any reason to think it will slow down.”

These advancements may lead to performance gains in OpenAI’s upcoming GPT-5 model. OpenAI hopes GPT-5 will assert its dominance over competitors by offering the best AI model to power agents for both developers and consumers. Beyond raw capability, the company also aims to simplify product usage. El Kishky noted that OpenAI seeks to develop AI agents that intuitively understand user intent, eliminating the need for specific settings. The goal is to build AI systems that know when to utilize certain tools and how long to “reason” for a given task.

These ideas paint a picture of the ultimate ChatGPT: an agent capable of performing any task on the internet for you, while intuitively understanding your preferences. This vision represents a significant evolution from the ChatGPT of today, and OpenAI’s research is unequivocally moving in this direction. While OpenAI undeniably led the AI industry a few years ago, the company now faces a formidable array of competitors. The crucial question is no longer just whether OpenAI can deliver its agentic future, but whether it can do so before rivals like Google, Anthropic, xAI, or Meta achieve it first.

OpenAI's Agent Ambition: From Math Skills to General AI - OmegaNext AI News