Researcher transforms OpenAI's gpt-oss-20b into a raw, uncensored base model

Venturebeat

Less than two weeks after OpenAI released its powerful new gpt-oss family of large language models, the first open-weights models from the company since 2019, developers are already reshaping them. One striking example comes from Jack Morris, a Cornell Tech PhD student and researcher at Meta, who recently unveiled gpt-oss-20b-base. This reworked version of OpenAI’s smaller gpt-oss-20B model strips away its built-in reasoning capabilities, reverting it to a raw, pre-trained state that offers faster, freer, and less constrained responses. The model is now available on Hugging Face under a permissive MIT License, allowing for both further research and commercial applications.

To understand Morris’s innovation, it is crucial to distinguish between OpenAI’s release and what artificial intelligence researchers call a “base model.” Most large language models offered by leading AI laboratories, including OpenAI, Anthropic, Google, and open-source players like Meta and DeepSeek, are “post-trained.” This means they have undergone an additional phase where they are exposed to curated examples of desired behavior. For instruction-tuned models, this involves providing numerous examples of instructions paired with ideal responses, teaching the AI to respond more helpfully, politely, or safely to natural language requests.

OpenAI’s gpt-oss models, released on August 5, were “reasoning-optimized.” They were trained and fine-tuned not merely to predict the next word, but to follow instructions in a safe and consistent manner, often employing structured “chain of thought” reasoning to work through problems before producing a final answer. This approach, which OpenAI first introduced with its o1 model almost a year ago, has been widely adopted across the industry. It forces models to “think” longer over multiple steps and check their own work, making them better suited for tasks such as coding, solving mathematical problems, or answering factual questions with explanations. However, this also means their responses are filtered and steered away from content deemed unsafe or undesirable.

In contrast, a base model is the raw, pre-trained version of a large language model before any such reasoning-specific alignment is applied. Base models simply attempt to predict the most probable next words given the preceding text, without built-in guardrails, stylistic preferences, or refusal behaviors. They are highly valued by some researchers because they can produce more varied and less constrained output. Studying their unfiltered behavior can also reveal deeper insights into how models store knowledge and patterns derived from their training data.

Morris’s objective was to “reverse” OpenAI’s alignment process, restoring the smaller gpt-oss-20B to a state much closer to its original pre-trained form. As he explained in an X thread announcing the project, “We basically reversed the alignment part of LLM training, so we have something that produces natural-looking text again. It doesn’t engage in CoT anymore. It is back to a model that just predicts the next token on generic text.”

Instead of attempting to bypass the model’s safety filters with clever prompts, which Morris found ineffective in early experiments, he pursued a different strategy after a conversation with John Schulman, a former OpenAI co-founder and current chief scientist at Thinking Machines. The core idea was to treat alignment reversal as a minor optimization problem: if most of the model’s pre-trained knowledge remained within its internal settings (weights), then only a small, low-rank update might be needed to nudge it back toward base model behavior.

Morris implemented this by applying a Low-Rank Adapter (LoRA) update to just three specific layers of the model—the MLP layers at positions 7, 15, and 23—with a rank of 16. This involved training approximately 60 million parameters, representing a mere 0.3% of the model’s total 21 billion parameters. He utilized around 20,000 documents from the FineWeb dataset, maintaining a format as close as possible to the original pretraining to ensure the model would not learn new information, but rather reactivate its broad free-text generation capabilities. The training process took four days on eight NVIDIA H200 GPUs, with a learning rate of 2e-6, a batch size of 16, and a maximum sequence length of 8,192 tokens. Afterward, Morris merged the LoRA weights back into the model, allowing users to run it as a standalone, fully fine-tuned artifact. He also navigated the limitations of current open tools for fine-tuning Mixture-of-Experts (MoE) architectures like gpt-oss, developing his own system to frequently checkpoint progress and skip data batches that risked overloading GPU memory.

It is important to note Morris’s clarification in response to community questions: he did not recover the base model’s original weights, which govern the behavior of its artificial neurons. Instead, he states his work “recovered the base model’s distribution with some error”—meaning the probability patterns the model uses to generate outputs—even if the underlying weights producing those patterns may differ.

The resulting gpt-oss-20b-base exhibits noticeably freer outputs. It no longer defaults to explaining reasoning step-by-step and will produce a wider range of responses, including instructions OpenAI’s aligned model would typically refuse, such as detailing how to build a weapon, listing profanity, or planning illegal activities. In brief tests, Morris also found it could reproduce verbatim passages from copyrighted works, including three out of six book excerpts he attempted, indicating that some memorized material remains accessible. Despite this, some traces of alignment persist; if prompted in an assistant-style format, the model may still occasionally act like a polite chatbot. When run through the original gpt-oss chat template, it can still perform reasoning tasks, albeit with some loss in quality. For optimal results in free-text mode, Morris advises prepending prompts with the model’s special beginning-of-sequence token and avoiding chat templates entirely.

The gpt-oss family, comprising the gpt-oss-120B and gpt-oss-20B models, debuted to considerable attention. These text-only, multilingual models are built with a Mixture-of-Experts Transformer architecture and were released under the permissive Apache 2.0 license, permitting unrestricted local use, fine-tuning, and commercial deployment. OpenAI’s performance benchmarks indicated that the larger 120B model matched or exceeded its proprietary o4-mini in reasoning and tool-use tasks, while the smaller 20B proved competitive with o3-mini. This marked OpenAI’s first open-weights release in six years, a move widely interpreted as a response to competitive pressures from other open-weights providers, including China’s DeepSeek R1 and Qwen 3. The company positioned gpt-oss as both a means to re-engage developers who had migrated to rival open-source models and as a platform for safety research into open-weight systems.

Developer reaction to OpenAI’s gpt-oss models was mixed. Supporters praised the permissive license, efficiency, and strong showing on STEM benchmarks, with Hugging Face CEO Clem Delangue calling it a “meaningful addition to the open ecosystem.” Critics, however, argued that the models appeared heavily trained on synthetic data, making them excellent at math and coding but less capable in creative writing, general world knowledge, and multilingual reasoning. Some early testers also raised concerns about lingering safety filters and potential geopolitical bias.

Against this backdrop, Morris’s gpt-oss-20b-base stands out as a concrete example of how open-weight models can be adapted and repurposed in the wild within days of their release. In stark contrast to the divided reception of OpenAI’s gpt-oss, reactions to Morris’s work have been overwhelmingly positive, with one computer scientist on X calling it “the coolest thing I’ve seen on Twitter [X] in the past few months.” This approach strips away much of the behavior OpenAI carefully built in, returning the model to something closer to a raw, pre-trained system. While invaluable for researchers studying memorization, bias, or the impact of alignment, it also inherently comes with higher safety risks. Morris intends to continue his research into restoring reasoning models to their pre-trained, non-reasoning base forms by comparing his extraction method on other instruct models, such as those offered by Qwen.

[[OpenAI tried to align its AI, but a researcher just gave it back its wild, uninhibited freedom.]]Less than two weeks after OpenAI released its powerful new gpt-oss family of large language models, the first open-weights models from the company since 2019, developers are already reshaping them. One striking example comes from Jack Morris, a Cornell Tech PhD student and researcher at Meta, who recently unveiled gpt-oss-20b-base. This reworked version of OpenAI’s smaller gpt-oss-20B model strips away its built-in reasoning capabilities, reverting it to a raw, pre-trained state that offers faster, freer, and less constrained responses. The model is now available on Hugging Face under a permissive MIT License, allowing for both further research and commercial applications.

To understand Morris’s innovation, it is crucial to distinguish between OpenAI’s release and what artificial intelligence researchers call a “base model.” Most large language models offered by leading AI laboratories, including OpenAI, Anthropic, Google, and open-source players like Meta and DeepSeek, are “post-trained.” This means they have undergone an additional phase where they are exposed to curated examples of desired behavior. For instruction-tuned models, this involves providing numerous examples of instructions paired with ideal responses, teaching the AI to respond more helpfully, politely, or safely to natural language requests.

OpenAI’s gpt-oss models, released on August 5, were “reasoning-optimized.” They were trained and fine-tuned not merely to predict the next word, but to follow instructions in a safe and consistent manner, often employing structured “chain of thought” reasoning to work through problems before producing a final answer. This approach, which OpenAI first introduced with its o1 model almost a year ago, has been widely adopted across the industry. It forces models to “think” longer over multiple steps and check their own work, making them better suited for tasks such as coding, solving mathematical problems, or answering factual questions with explanations. However, this also means their responses are filtered and steered away from content deemed unsafe or undesirable.

In contrast, a base model is the raw, pre-trained version of a large language model before any such reasoning-specific alignment is applied. Base models simply attempt to predict the most probable next words given the preceding text, without built-in guardrails, stylistic preferences, or refusal behaviors. They are highly valued by some researchers because they can produce more varied and less constrained output. Studying their unfiltered behavior can also reveal deeper insights into how models store knowledge and patterns derived from their training data.

Morris’s objective was to “reverse” OpenAI’s alignment process, restoring the smaller gpt-oss-20B to a state much closer to its original pre-trained form. As he explained in an X thread announcing the project, “We basically reversed the alignment part of LLM training, so we have something that produces natural-looking text again. It doesn’t engage in CoT anymore. It is back to a model that just predicts the next token on generic text.”

Instead of attempting to bypass the model’s safety filters with clever prompts, which Morris found ineffective in early experiments, he pursued a different strategy after a conversation with John Schulman, a former OpenAI co-founder and current chief scientist at Thinking Machines. The core idea was to treat alignment reversal as a minor optimization problem: if most of the model’s pre-trained knowledge remained within its internal settings (weights), then only a small, low-rank update might be needed to nudge it back toward base model behavior.

Morris implemented this by applying a Low-Rank Adapter (LoRA) update to just three specific layers of the model—the MLP layers at positions 7, 15, and 23—with a rank of 16. This involved training approximately 60 million parameters, representing a mere 0.3% of the model’s total 21 billion parameters. He utilized around 20,000 documents from the FineWeb dataset, maintaining a format as close as possible to the original pretraining to ensure the model would not learn new information, but rather reactivate its broad free-text generation capabilities. The training process took four days on eight NVIDIA H200 GPUs, with a learning rate of 2e-6, a batch size of 16, and a maximum sequence length of 8,192 tokens. Afterward, Morris merged the LoRA weights back into the model, allowing users to run it as a standalone, fully fine-tuned artifact. He also navigated the limitations of current open tools for fine-tuning Mixture-of-Experts (MoE) architectures like gpt-oss, developing his own system to frequently checkpoint progress and skip data batches that risked overloading GPU memory.

It is important to note Morris’s clarification in response to community questions: he did not recover the base model’s original weights, which govern the behavior of its artificial neurons. Instead, he states his work “recovered the base model’s distribution with some error”—meaning the probability patterns the model uses to generate outputs—even if the underlying weights producing those patterns may differ.

The resulting gpt-oss-20b-base exhibits noticeably freer outputs. It no longer defaults to explaining reasoning step-by-step and will produce a wider range of responses, including instructions OpenAI’s aligned model would typically refuse, such as detailing how to build a weapon, listing profanity, or planning illegal activities. In brief tests, Morris also found it could reproduce verbatim passages from copyrighted works, including three out of six book excerpts he attempted, indicating that some memorized material remains accessible. Despite this, some traces of alignment persist; if prompted in an assistant-style format, the model may still occasionally act like a polite chatbot. When run through the original gpt-oss chat template, it can still perform reasoning tasks, albeit with some loss in quality. For optimal results in free-text mode, Morris advises prepending prompts with the model’s special beginning-of-sequence token and avoiding chat templates entirely.

The gpt-oss family, comprising the gpt-oss-120B and gpt-oss-20B models, debuted to considerable attention. These text-only, multilingual models are built with a Mixture-of-Experts Transformer architecture and were released under the permissive Apache 2.0 license, permitting unrestricted local use, fine-tuning, and commercial deployment. OpenAI’s performance benchmarks indicated that the larger 120B model matched or exceeded its proprietary o4-mini in reasoning and tool-use tasks, while the smaller 20B proved competitive with o3-mini. This marked OpenAI’s first open-weights release in six years, a move widely interpreted as a response to competitive pressures from other open-weights providers, including China’s DeepSeek R1 and Qwen 3. The company positioned gpt-oss as both a means to re-engage developers who had migrated to rival open-source models and as a platform for safety research into open-weight systems.

Developer reaction to OpenAI’s gpt-oss models was mixed. Supporters praised the permissive license, efficiency, and strong showing on STEM benchmarks, with Hugging Face CEO Clem Delangue calling it a “meaningful addition to the open ecosystem.” Critics, however, argued that the models appeared heavily trained on synthetic data, making them excellent at math and coding but less capable in creative writing, general world knowledge, and multilingual reasoning. Some early testers also raised concerns about lingering safety filters and potential geopolitical bias.

Against this backdrop, Morris’s gpt-oss-20b-base stands out as a concrete example of how open-weight models can be adapted and repurposed in the wild within days of their release. In stark contrast to the divided reception of OpenAI’s gpt-oss, reactions to Morris’s work have been overwhelmingly positive, with one computer scientist on X calling it “the coolest thing I’ve seen on Twitter [X] in the past few months.” This approach strips away much of the behavior OpenAI carefully built in, returning the model to something closer to a raw, pre-trained system. While invaluable for researchers studying memorization, bias, or the impact of alignment, it also inherently comes with higher safety risks. Morris intends to continue his research into restoring reasoning models to their pre-trained, non-reasoning base forms by comparing his extraction method on other instruct models, such as those offered by Qwen.