OpenAI Releases Open-Weight gpt-oss Models for Local AI Deployment

Infoq

OpenAI has unveiled gpt-oss-120b and gpt-oss-20b, its first truly open-weight language models since GPT-2, marking a significant step towards enabling high-performance AI reasoning and tool use on local hardware. These models are released under the permissive Apache 2.0 license, allowing broad adoption and modification.

The more powerful of the two, gpt-oss-120b, leverages a mixture-of-experts (MoE) architecture, activating 5.1 billion parameters per token. This design allows it to match or even surpass the performance of OpenAI’s proprietary o4-mini model on critical reasoning benchmarks, all while running efficiently on a single 80 GB GPU. Its smaller counterpart, gpt-oss-20b, is designed for greater accessibility, activating 3.6 billion of its 21 billion parameters. Crucially, gpt-oss-20b can operate on consumer-grade hardware with as little as 16 GB of memory, making it ideal for on-device inference or rapid development cycles without the need for cloud infrastructure.

Both models are equipped to handle advanced AI applications, supporting sophisticated techniques such as chain-of-thought reasoning, integrated tool use, and the generation of structured outputs. Developers gain the flexibility to adjust the model’s reasoning effort, allowing them to fine-tune the balance between processing speed and accuracy for specific tasks.

These gpt-oss models were developed using training methodologies adapted from OpenAI’s internal o-series models, incorporating features like rotary positional embeddings and grouped multi-query attention. They also boast impressive context lengths of up to 128k tokens. Extensive evaluations across diverse domains, including coding (Codeforces), health (HealthBench), mathematics, and agentic benchmarks (MMLU, TauBench), demonstrated their robust capabilities, even when compared against closed models like o4-mini and GPT-4o.

In a move to foster research into model behavior and potential risks, OpenAI released these models without directly supervising their chain-of-thought reasoning processes. This approach allows researchers to openly examine the models’ internal reasoning traces for issues such as bias or misuse. To proactively address safety concerns, OpenAI conducted rigorous worst-case scenario fine-tuning using adversarial data, particularly in the sensitive fields of biology and cybersecurity. The company reported that, even under these strenuous conditions, the models did not achieve high-risk capability levels according to OpenAI’s Preparedness Framework. Findings from independent external expert reviewers also informed the final release. Further emphasizing its commitment to safety, OpenAI has launched a red teaming challenge with a substantial $500,000 prize pool, inviting the community to rigorously test the models in real-world scenarios.

The gpt-oss models are now broadly available on platforms like Hugging Face and various other deployment services. The gpt-oss-20b model, in particular, stands out for its minimal hardware requirements for local execution. Users can run it on a computer without an internet connection after initial download, requiring at least 16 GB of RAM (either VRAM or system memory). For instance, a MacBook Air with 16 GB of RAM can run the model at speeds of tens of tokens per second, while a modern GPU can achieve hundreds of tokens per second. Microsoft is also contributing to the accessibility of the 20B model by providing GPU-optimized versions for Windows via ONNX Runtime, making it available through Foundry Local and the AI Toolkit for VS Code.