Huawei chip issues delay DeepSeek's R2 LLM, forces Nvidia switch

Theregister

The anticipated launch of DeepSeek’s next-generation large language model, R2, has reportedly been significantly delayed due to unforeseen challenges with Huawei’s homegrown artificial intelligence chips. Following the impactful debut of its R1 model earlier this year, the prominent Chinese AI developer faced considerable governmental pressure to train its successor using domestic silicon from Huawei.

However, after months of intensive effort, which included collaboration with a dedicated team of Huawei engineers, DeepSeek encountered insurmountable obstacles. Sources close to the matter, speaking to the Financial Times, revealed that the Huawei chips proved unstable, their interconnects were glacially slow, and the accompanying software was too immature to facilitate effective training. Crucially, DeepSeek was unable to complete even a single successful training run on the Huawei hardware. This fundamental failure, compounded by difficulties in data labeling, ultimately forced the company to restart its development process, pivoting to Nvidia’s H20 graphics processing units for its core training operations. Huawei’s Ascend accelerators have reportedly been relegated to inference tasks, which involve running already-trained models, a less demanding computational workload.

Huawei’s Ascend accelerators, particularly the Ascend 910C that powers its CloudMatrix rack-scale computing platform, have recently garnered substantial attention as a domestic alternative to Western chips. While the precise revision of Huawei’s chips used by DeepSeek remains undisclosed, the Ascend 910C, on paper, boasts impressive specifications. It offers more video RAM (dedicated memory for graphics processing) and over twice the BF16 floating-point performance—a key metric for AI computations—compared to Nvidia’s H20. Though it lags slightly in memory bandwidth, this is generally less critical for model training than for inference.

Despite these theoretical advantages, training a large language model is an exceptionally complex undertaking that extends far beyond the capabilities of a single chip. It involves distributing some of humanity’s most computationally intensive workloads across tens of thousands of processors. In such a distributed system, the failure of even a single component can necessitate restarting the entire process from the last stable checkpoint. For this reason, it is common for new entrants into the AI chip market to initially focus on inference, where the impact of a system failure is far less severe, while they iron out the complexities required to scale their technology for large-scale training. Huawei appears to be following this trajectory with its CloudMatrix rack systems, which are designed to simplify the deployment of extensive training clusters built on its chips.

DeepSeek’s existing training infrastructure was heavily optimized for Nvidia hardware, with much of its original V3 model (the basis for R1) trained using FP8, an efficient 8-bit data type. A switch to Huawei’s Ascend chips would have demanded significant retooling, not only requiring an entirely different software stack but also forcing DeepSeek to rely on more memory-intensive 16-bit data types, as Ascend accelerators do not support FP8. Even considering the strategic importance of training a frontier model on homegrown Chinese silicon, this technical concession highlights the immense challenges of such a transition.

One possible explanation for the specific mention of the R2 model, rather than a V4 iteration, is that DeepSeek might have intended to use Huawei’s Ascend accelerators primarily for the reinforcement learning phase of the model’s training. This phase is heavily reliant on inference, as it involves processing vast quantities of “tokens” (basic units of text) to imbue an existing base model with advanced reasoning capabilities. This news comes just days after Bloomberg reported that Chinese authorities have begun discouraging model developers from utilizing Nvidia’s H20 accelerators, particularly for sensitive government projects, underscoring the ongoing geopolitical complexities influencing the global AI chip landscape.