Zhipu AI Launches GLM-4.5 Model with Enhanced Efficiency and Multi-Ability Integration

Introduction

On July 28, Zhipu AI released its flagship model GLM-4.5 as an open-source project. GLM-4.5 is a foundational model specifically developed for intelligent applications, demonstrating excellent performance, cost control, and multi-ability integration.

Core Team and Development

The core team of Zhipu AI primarily comes from Tsinghua University’s KEG (Knowledge Engineering) laboratory. Key members include Chairman Liu Debing, CEO Zhang Peng, and President Wang Shaolan, all of whom are core members of the KEG lab. Both Zhang and Wang hold PhDs in Innovation Leadership Engineering from Tsinghua, while Chief Scientist Tang Jie previously served as a professor in the Computer Science Department at Tsinghua University.

The development of the GLM series has seen over four years of iterations, starting with the early GLM model (10B) in 2021, which explored optimizations of the Transformer architecture. In 2022, the GLM-130B was launched with a parameter scale of 130B, and in 2023, GLM-3 introduced a lightweight design using a mixture of experts (MoE) architecture, laying the groundwork for future improvements in parameter efficiency.

Technical Innovations

The GLM series of LLMs (Large Language Models) is built on the Transformer architecture. GLM-130B employs DeepNorm, a normalization method for stabilizing the training of deep Transformer models, as its layer normalization strategy. It also utilizes rotary position embeddings (RoPE) in the feedforward network (FFN) and incorporates gated linear units (GLU) with GeLU activation functions to enhance the model’s feature selection and processing capabilities.

GLM-3 adopted a unique multi-stage enhanced pre-training method, leveraging the latest efficient dynamic inference and memory optimization technologies. Its inference framework achieved a 2-3 times speed increase and halved inference costs compared to the best open-source implementations at the time. This indicates significant progress in model architecture optimization, contributing to long-term data accumulation for different task feature distributions.

Parameter Efficiency Strategy

Unlike many teams that focus on increasing parameter counts, Zhipu AI has adhered to an “efficient parameter” approach since GLM-2, optimizing the synergy of expert modules instead of blindly expanding total parameter counts. For instance, GLM-4.5 has a total of 355 billion parameters, with 32 billion being active parameters, accounting for about 9%. Each expert module is dedicated to specific tasks (e.g., code modules focusing on Python and JavaScript, reasoning modules on mathematics and logic), interconnected through lightweight routing layers to avoid redundancy in dense architectures.

Additionally, the GLM-4.5-Air variant features 106 billion total parameters and 12 billion active parameters, with an active ratio of approximately 11%. This approach requires a more detailed breakdown of task types, while some teams opt for more conservative dense architectures due to concerns over increased complexity and extended development cycles.

The ratio of active parameters is crucial for the commercial cost of inference: a lower active parameter ratio means more parameters are not effectively participating in inference calculations, leading to wasted computational resources and increased inference costs. GLM-4.5 achieves “double the parameter efficiency” with API prices only 1/10th of Claude’s (a large model family by Anthropic), processing over 100 tokens per second due to its high active parameter ratio.

Training Data and Annotation

GLM-4.5’s training data employs a dual-layer structure of “general + vertical” data: the base layer consists of 15 trillion tokens of general text, while the upper layer includes 8 trillion tokens of vertical domain data, annotated separately for three task categories: reasoning, coding, and intelligent agents. The annotation method is not merely categorical but designed with specific training objectives for each task, such as emphasizing logical chain completeness for reasoning tasks and syntactical correctness for coding tasks.

Community Contributions and Multi-Ability Integration

Zhipu AI is one of the first companies in China to promote open-source large models. Following the open-sourcing of GLM-2 in 2023, a vast developer community has emerged, providing bug feedback and contributing lightweight deployment solutions. The “thinking/non-thinking mode” switching feature of GLM-4.5 likely stems from optimization suggestions from community developers.

Zhipu AI’s official announcement claims it has “achieved native integration of reasoning, coding, and agent capabilities for the first time.” However, the technical barriers to multi-ability integration arise from the need to solve the collaboration challenges between modules: the logical reasoning of the reasoning module and the syntactical rules of the coding module belong to different cognitive paradigms, and forced integration can dilute capabilities.

Previous attempts in the industry to achieve integration through “splicing” (attaching coding modules to reasoning models) have resulted in a lack of shared parameters, significantly reducing response speed. GLM-4.5 employs a unified underlying architecture that requires planning for parameter sharing from the initial model design phase, demanding high architectural design capabilities that most teams have yet to overcome.

Conclusion

Since its establishment in 2019, Zhipu AI has undergone at least 11 rounds of financing, indicating that financial pressure is not a significant concern. This allows the company to be patient in optimizing its architecture and to invest time in specialized optimizations for multi-ability collaboration, a rarity in an industry focused on short-term returns. The breakthroughs of GLM-4.5 represent a comprehensive result of technological accumulation, strategic choices, and ecological collaboration. The launch of this multi-ability integrated model signifies a shift in the competition of large models from mere parameter scale to system efficiency and ecological vitality, potentially providing new benchmarks for industry development and performance evaluation.