OpenAI Launches First Open-Weight LLMs Since GPT-2 with GPT-OSS
OpenAI has launched GPT-OSS, its first set of open-weights language models since the release of GPT-2, marking a significant shift in its strategy regarding model accessibility. The new models, available under the highly permissive Apache 2.0 license, offer developers extensive freedom for commercial and non-commercial applications without restrictive clauses.
The GPT-OSS series debuts with two distinct models: a 120-billion parameter reasoning model and a more compact 20-billion parameter version. OpenAI states that the larger model delivers performance comparable to its proprietary o4-mini model, while the smaller variant achieves results akin to the o3-mini.
These models were primarily trained on a vast corpus of English text, with a particular focus on STEM subjects, coding, and general knowledge. Unlike some of OpenAI's larger, more advanced models such as GPT-4o, GPT-OSS does not incorporate vision capabilities. During post-training, OpenAI applied reinforcement learning techniques, similar to those used for its o4-mini model, to imbue GPT-OSS with chain-of-thought reasoning abilities. Users can adjust the models' reasoning effort—low, medium, or high—through system prompts.
Both GPT-OSS models leverage a Mixture of Experts (MoE) architecture, a design choice that enhances efficiency. In the 120-billion parameter model, 128 specialized sub-networks, or "experts," are available, with four (totaling 5.1 billion parameters) actively generating each output token. The 20-billion parameter version is a streamlined design with 32 experts and 3.6 billion active parameters. This MoE structure allows for faster token generation compared to dense models of equivalent size, provided the hardware can accommodate them.
Regarding hardware requirements, OpenAI has optimized these models for efficient operation. The 120-billion parameter model can run on a single 80GB H100 GPU, while the 20-billion parameter version is designed to fit within just 16GB of VRAM. Preliminary testing of the GPT-OSS-20B model on an RTX 6000 Ada GPU demonstrated token generation rates exceeding 125 tokens per second at a batch size of one.
The models feature a native context window of 128,000 tokens. While competitive a year ago, this capacity is now surpassed by some rivals, such as Alibaba's Qwen3 family, which offers a 256,000-token context window, and Meta's Llama 4, supporting up to 10 million tokens.
The release of GPT-OSS follows multiple delays, which OpenAI CEO Sam Altman attributed to extensive safety evaluations. In a recent blog post, OpenAI detailed the safety measures implemented, including filtering out harmful data related to chemical, biological, radiological, or nuclear (CBRN) research and development. The models have also been designed to resist unsafe prompts and prompt injection attempts. OpenAI acknowledged the risk of adversaries fine-tuning open-weight models for malicious purposes but expressed confidence in its safeguards. To further test these measures, the company has launched a red-teaming challenge, offering a half-million-dollar prize to anyone who can identify novel safety vulnerabilities.
GPT-OSS is currently available on various model repositories, including Hugging Face, and supports a wide range of inference frameworks, such as Hugging Face Transformers, PyTorch, Triton, vLLM, Ollama, and LM Studio.
Looking ahead, Sam Altman hinted at further developments, stating on X that a "big upgrade" is expected later this week, fueling speculation about a potential GPT-5 release.