Open-Source AI Models: Higher Long-Term Costs Due to Token Inefficiency

As businesses increasingly integrate artificial intelligence into their operations, a critical decision arises: whether to adopt open-source or proprietary AI models. While open-source options often appear more economical at first glance, a recent study by Nous Research suggests that these initial savings can quickly erode due to their higher demand for computing power. The findings, published this week, indicate that open-source AI models typically consume significantly more computational resources than their closed-source rivals when performing identical tasks.

To quantify this resource consumption, researchers at Nous Research rigorously tested dozens of AI models, including closed systems from industry giants like Google and OpenAI, alongside open-source alternatives from developers such as DeepSeek and Magistral. They meticulously measured the computing effort each model required to complete a range of tasks, categorized into simple knowledge questions, mathematical problems, and logic puzzles. The primary metric for this measurement was the number of “tokens” each model utilized to process and generate responses.

In the realm of artificial intelligence, a token represents the smallest unit of text or data that a model processes—it could be a word, a fragment of a word, or even punctuation. AI models understand and generate language by processing these tokens sequentially. Consequently, a higher token count for a given task directly translates to increased computing power and longer processing times. The study highlighted a striking disparity: “Open-weight models use 1.5–4 times more tokens than closed ones—and up to 10 times for simple knowledge questions—making them sometimes more expensive per query despite lower per-token costs,” the authors noted.

This efficiency gap carries significant implications for companies deploying AI. First, although the direct hosting costs for open-weight models might be lower, this advantage can be swiftly nullified if the models demand substantially more tokens to analyze and solve a problem. Second, an elevated token count directly leads to prolonged generation times and increased latency, which can be detrimental for applications requiring rapid responses. Since most closed-source models do not disclose their internal reasoning processes or “chain of thought,” the researchers relied on the total output tokens—which include both the model’s internal processing and its final answer—as a reliable proxy for the computational effort expended.

The research unequivocally demonstrated that open-source models consistently required more tokens than their closed counterparts for the same tasks. For simple knowledge questions, open models sometimes used three times as many tokens. While this gap narrowed for more complex mathematical and logic problems, open models still consumed nearly twice as many tokens. The study posited that closed models, such as those from OpenAI and Grok-4, appear to be optimized for token efficiency, likely to minimize operational costs. In contrast, open models like DeepSeek and Qwen, while consuming more tokens, may do so to facilitate more robust reasoning processes.

Among the open-source models evaluated, llama-3.3-nemotron-super-49b-v1 emerged as the most token-efficient, whereas Magistral models proved to be the least efficient. OpenAI’s offerings, particularly its o4-mini and the newer open-weight gpt-oss models, showcased remarkable token efficiency, especially when tackling mathematical problems. The researchers specifically pointed to OpenAI’s gpt-oss models, with their concise internal reasoning chains, as a potential benchmark for improving token efficiency across the broader landscape of open-source AI models.

Ultimately, the study underscores a crucial consideration for businesses: the true cost of an AI model extends far beyond its initial licensing or deployment fees. The long-term operational expenses, heavily influenced by computational resource consumption, can quickly turn a seemingly cheaper open-source option into a more costly endeavor over time.

Open-Source AI Models: Higher Long-Term Costs Due to Token Inefficiency

Related Articles

Open-Source AI Models Burn More Compute Than Closed Counterparts

Data Filtering: Tamper-Resistant AI Safety for Open-Weight LLMs

MoA: Multi-Agent LLM Collaboration for SOTA Performance

Related Articles

▸
Open-Source AI Models Burn More Compute Than Closed Counterparts

▸
Data Filtering: Tamper-Resistant AI Safety for Open-Weight LLMs

▸
MoA: Multi-Agent LLM Collaboration for SOTA Performance