ARM is set to launch a 3-nanometer process AGI CPU, analyzing the changes in hardware demand and investment paradigms triggered by changes in AI software through the lens of AGI CPUs.
One, Current status of ARM CPU in AI applications
ARM is used for mobile CPU architecture, and Apple's M series CPUs have also shifted to ARM-like architecture, quietly occupying a significant market share in the AI server market. (This article does not discuss the pros and cons of reduced instruction set; performance has already been proven in the mobile market and Apple’s self-developed CPUs.)
ARM has not completely replaced x86 in all server scenarios, but has prioritized occupying the control layer of AI and cloud-native workloads. The penetration rate of AI servers (newly deployed CPUs) has reached 20-30% and continues to rise. The advantage lies in meeting computing requirements, controlling scenarios, being customizable, energy-saving, etc.
Two, AGI CPU architecture leap: From computation-driven to data-driven
2.1 Indicator migration: Changing the definition of performance
Traditional business, CPU measurement standards:
GHz (Frequency: The number of cycles per second of the CPU)
IPC (Instructions Per Cycle: The number of instructions executed per cycle)
The performance of AI era CPUs represented by AGI:
Memory Bandwidth: The data transfer capacity per unit time
IO Throughput: The speed of data input/output
Latency: The time for data to reach the computing unit
2.2 Analysis of past growth bottlenecks:
The rapid growth of GPU has outpaced the development of memory and data bandwidth.
2.3 Software-induced changes in architectural bottlenecks:
Traditional: Limited by CPU and GPU computing power.
AI Era: Performance is limited by memory bandwidth/IO, etc.
2.4 Redefining the CPU perspective:
Traditional: CPU = Computing core
AI era: Computing tasks + data scheduling + flow control core.
Three, Changes on the AI application side: From computing demand to data demand
3.1 Splitting AI application analysis bottlenecks:
LLM training, the bottleneck is data bandwidth.
Inference tasks and retrieval tasks, bottlenecks lie in memory capacity and IO.
Infrastructure requirements: Inference acceleration, vector database retrieval, etc., the solution direction is how to better optimize scheduling? How to ensure physical space is close enough? How to expand bandwidth? How to ensure the space is large enough? These are all new demands that investment should focus on.
Four, HBM (High Bandwidth Memory): The 'new oil' of the AI era
DDR5, bandwidth: 50-100 GB/s
HBM, bandwidth: 800-1000+ GB/s
In AI servers, GPU + HBM accounts for 50-70% of the total, with HBM accounting for 20% to 30% of this portion's cost.
Five, CXL (Compute Express Link): A key variable for resource efficiency
Traditional: Memory and CPU correspond one-to-one, with low utilization.
CXL: Shared memory pool direction. Development requires a process, such as Apple's unified memory architecture where CPU and GPU share memory, ARM AGI CPU further sharing algorithms without the need to repeatedly transport computation results, then to multi-CPU GPU clusters sharing, and finally to multiple servers sharing a memory pool. Technological development is not achieved overnight; breakthroughs in bottlenecks start with the physical architecture to solve the most fundamental issues directly.
Investment perspective: CXL controller (core chip), memory expansion devices (expansion hardware), data center software layer (resource scheduling).
Six, Investment paradigm shift: From computing power to data flow
6.1 Computing layer + scheduling layer
GPU dedicated computing
CPU adds data scheduling + process control
Changes in the computing layer + scheduling layer are essential to adapt to new demand development, directly affecting future market share.
6.2 Data Layer
HBM (Bandwidth bottleneck)
Storage (Capacity bottleneck)
The strongest certainty, technological thresholds determine pricing power, all belong to market oligopoly companies.
6.3 Transmission Layer
NVLink (high-speed chip interconnect), CXL (memory networking) and other directions have great imaginative space, but they develop along with the evolution of software and hardware scheduling architectures, with many alternatives.
Predictable capital flow paths: CPU → GPU → Memory → Interconnect
This article is applicable to the US stock market, suitable for #RWA on-chain transactions in progress, and also applicable for selecting materials for cryptocurrency projects. Feel free to interact and discuss any questions!
