Zhipu AI's GLM-4.5: Advanced Reasoning, Coding, and Agentic AI

Infoq

Zhipu AI has unveiled its latest advancements in artificial intelligence with the release of GLM-4.5 and GLM-4.5-Air, two new models engineered to excel across a spectrum of demanding tasks including complex reasoning, coding, and agentic operations. These models introduce a sophisticated dual-mode system, allowing them to dynamically switch between deep analytical “thinking” for intricate problem-solving and rapid “non-thinking” responses for more straightforward queries, thereby aiming to optimize both accuracy and speed.

At its core, GLM-4.5 boasts a substantial architecture with 355 billion total parameters and 32 billion active parameters. Its lighter counterpart, GLM-4.5-Air, operates with 106 billion total and 12 billion active parameters. Both models leverage a Mixture-of-Experts (MoE) architecture, a design choice increasingly favored for its efficiency and scalability. Diverging from the “wider” approach seen in some contemporary models like DeepSeek-V3, GLM-4.5 emphasizes depth, incorporating 96 attention heads per layer. Further enhancing its performance, the models integrate advanced features such as QK-Norm, Grouped Query Attention, Multi-Token Prediction, and the Muon optimizer, all contributing to faster convergence during training and improved reasoning capabilities.

The training regimen for these new models was extensive, utilizing a colossal 22-trillion-token corpus. A significant portion of this data, 7 trillion tokens, was specifically dedicated to code and reasoning tasks. This foundational training was then augmented by reinforcement learning, powered by Zhipu AI’s proprietary “slime RL” infrastructure. This specialized setup features an asynchronous agentic RL training pipeline, meticulously designed to maximize throughput and effectively handle long-horizon, multi-step tasks.

Initial performance reports from Zhipu AI indicate strong competitive standing. GLM-4.5 has secured the 3rd overall position on a comprehensive suite of 12 benchmarks, which collectively assess agentic tasks, reasoning, and coding proficiency. This places it directly behind the top-tier models from industry giants like OpenAI and Anthropic. GLM-4.5-Air also demonstrates impressive capabilities, ranking 6th and outperforming numerous models of comparable or even larger scale.

The models particularly shine in coding benchmarks. GLM-4.5 achieved a remarkable 64.2% on SWE-bench Verified and 37.5% on TerminalBench. These scores position it ahead of notable competitors such as Claude 4 Opus, GPT-4.1, and Gemini 2.5 Pro across several key metrics. Its tool-calling success rate further underscores its practical utility, reaching 90.6% and surpassing Claude-4-Sonnet (89.5%) and Kimi K2 (86.2%).

Early testers have echoed these positive assessments, praising GLM-4.5’s robust coding and agentic functionalities. Reports from Reddit users highlight GLM-4.5’s “excellent” performance in coding tasks, with GLM-4.5-Air noted for its effectiveness in agentic research and summarization benchmarks, even outperforming models like Qwen 3 235B-a22b 2507 in preliminary comparisons. Users have also commended the GLM series for its speed and impressive language proficiency, with an earlier iteration, GLM 4.1 Thinking Flash, scoring highly in French language testing.

For developers and enterprises, GLM-4.5 offers flexible accessibility. It can be directly accessed via Z.ai, invoked through the Z.ai API, or seamlessly integrated into existing coding agents such as Claude Code or Roo Code. For those preferring local deployment, model weights are readily available on popular platforms like Hugging Face and ModelScope, with support for vLLM and SGLang inference frameworks.