OpenAI Launches Open-Weight AI Models, Shifts Strategy
OpenAI has launched two new “open-weight” AI reasoning models, making them freely available for download on the developer platform Hugging Face. The company describes these models as “state-of-the-art” when evaluated against several benchmarks for comparable open models.
The release includes two distinct sizes: the more robust gpt-oss-120b, designed to operate on a single Nvidia GPU, and the lighter gpt-oss-20b, which can run on a consumer laptop equipped with 16GB of memory. This marks OpenAI’s first publicly released “open” language model since GPT-2, which debuted more than five years ago.
OpenAI indicated that these new open models are capable of sending complex queries to the company’s more powerful AI models hosted in the cloud. This hybrid approach means that if an open model cannot perform a specific task, such as processing an image, developers can connect it to one of OpenAI’s more capable, closed-source models.
While OpenAI initially embraced open-sourcing in its early days, the company has predominantly pursued a proprietary, closed-source development strategy. This approach has been instrumental in building a substantial business by selling API access to its AI models for enterprises and developers. However, CEO Sam Altman expressed in January his belief that OpenAI had been “on the wrong side of history” regarding open-sourcing its technologies.
The company now faces increasing competition from Chinese AI laboratories, including DeepSeek, Alibaba’s Qwen, and Moonshot AI, which have developed several of the world’s most capable and widely adopted open models. This shift comes as Meta’s Llama AI models, once dominant in the open AI space, have reportedly fallen behind in the past year. Furthermore, the Trump Administration urged U.S. AI developers in July to open source more technology to foster global AI adoption aligned with American values.
With the introduction of gpt-oss, OpenAI aims to garner support from both developers and the Trump Administration, both of whom have observed the growing prominence of Chinese AI labs in the open-source domain. Sam Altman stated, “OpenAI’s mission is to ensure AGI that benefits all of humanity. To that end, we are excited for the world to be building on an open AI stack created in the United States, based on democratic values, available for free to all and for wide benefit.”
Model Performance and Hallucination
OpenAI sought to position its new open models as leaders among other open-weight AI models, claiming success in this endeavor.
On Codeforces, a competitive coding test utilizing tools, gpt-oss-120b achieved a score of 2622, while gpt-oss-20b scored 2516. Both models outperformed DeepSeek’s R1 but lagged behind OpenAI’s o3 and o4-mini models.
Similarly, on Humanity’s Last Exam, a challenging test of crowd-sourced questions across various subjects (also with tools), gpt-oss-120b scored 19% and gpt-oss-20b scored 17.3%. These results indicate underperformance compared to o3 but superior performance to leading open models from DeepSeek and Qwen.
Notably, OpenAI’s new open models exhibit significantly higher rates of “hallucination” – generating incorrect or nonsensical information – compared to its latest proprietary AI reasoning models, o3 and o4-mini. OpenAI attributes this to smaller models possessing less “world knowledge” than larger frontier models, leading to increased hallucination. On PersonQA, OpenAI’s internal benchmark for measuring knowledge accuracy about people, gpt-oss-120b hallucinated in response to 49% of questions, and gpt-oss-20b in 53%. This rate is more than triple that of OpenAI’s o1 model (16%) and higher than its o4-mini model (36%).
Training and Licensing
OpenAI stated that its open models were trained using processes similar to those for its proprietary models. Each open model incorporates a mixture-of-experts (MoE) architecture to efficiently activate fewer parameters for any given query. For instance, the gpt-oss-120b, which has 117 billion total parameters, activates only 5.1 billion parameters per token.
The models also underwent high-compute reinforcement learning (RL) during their post-training phase. This process, which uses large clusters of Nvidia GPUs in simulated environments, teaches AI models to distinguish correct from incorrect responses. Similar to OpenAI’s o-series models, the new open models employ a “chain-of-thought” process, dedicating additional time and computational resources to formulate their answers. This post-training has enabled the open models to excel at powering AI agents, allowing them to call tools such as web search or Python code execution. However, OpenAI emphasized that these open models are text-only and cannot process or generate images and audio like some of the company’s other models.
OpenAI is releasing gpt-oss-120b and gpt-oss-20b under the Apache 2.0 license, widely considered one of the most permissive. This license permits enterprises to monetize OpenAI’s open models without requiring payment or permission from the company. However, unlike offerings from fully open-source AI labs such as AI2, OpenAI will not release the training data used to create these models. This decision aligns with the context of several active lawsuits against AI model providers, including OpenAI, alleging inappropriate training on copyrighted works.
Safety Considerations
The release of OpenAI’s open models was reportedly delayed multiple times in recent months, partly due to safety concerns. Beyond its standard safety protocols, OpenAI conducted investigations into whether malicious actors could fine-tune gpt-oss models to facilitate cyberattacks or the creation of biological or chemical weapons.
Following assessments by both OpenAI and third-party evaluators, the company concluded that gpt-oss might marginally increase biological capabilities. However, no evidence was found that these open models could reach a “high capability” threshold for danger in these domains, even after fine-tuning.
While OpenAI’s new models appear to be at the forefront among open-source offerings, developers are also anticipating the release of DeepSeek R2, its next AI reasoning model, and a new open model from Meta’s superintelligence lab.