Product

The Wafer-Scale Advantage

The Cerebras WSE-3 surpasses all other processors in AI-optimized cores, memory speed, and on-chip fabric bandwidth.

	WSE-3	Nvidia H100	Cerebras Advantage
Chip Size	46,225 mm²	826 mm²	57 X
Cores	900,000	16,896 FP32 + 528 Tensor	52X
On-chip memory	44 Gigabytes	0.05 Gigabytes	880 X
Memory bandwidth	21 Petabytes/sec	0.003 Petabytes/sec	7,000 X
Fabric bandwidth	214 Petabits/sec	0.0576 Petabits/sec	3,715 X

52x

more AI compute cores

Compute Designed for AI

The WSE-3 packs 900,000 AI cores onto a single processor. Each core on the WSE is independently programmable and optimized for the tensor-based, sparse linear algebra operations that underpin neural network training and inference for deep learning, enabling it to deliver maximum performance, efficiency, and flexibility.

880x

more on-chip memory

Memory Capacity and Bandwidth: Why Choose?

Unlike traditional devices, in which the working cache memory is tiny, the WSE-3 takes 44GB of super-fast on-chip SRAM and spreads it evenly across the entire surface of the chip. This gives every core single-clock-cycle access to fast memory at extremely high bandwidth – 21 PB/s. This is 880x more capacity and 7,000x greater bandwidth than the leading GPU.

3,715x

more fabric bandwidth

High Bandwidth. Low Latency.

The WSE-3 on-wafer interconnect eliminates the communication slowdown and inefficiencies of connecting hundreds of small devices via wires and cables. It delivers an incredible 214 Pb/s processor-processor interconnect bandwidth. That’s more than 3,715x the bandwidth delivered between graphics processors.

Cluster-Scale Performance on a Single Chip

Programming a cluster to scale deep learning is painful. It typically requires dozens to hundreds of engineering hours and remains a practical barrier for many to realize the value of large-scale AI for their work.

On a traditional GPU cluster, ML researchers – typically using a special version of their ML framework – must figure out how to distribute their model while still achieving some fraction of their convergence and performance target. They must navigate the complex hierarchy of individual processors’ memory capacity, bandwidth, interconnect topology, and synchronization; all while performing a myriad of hyperparameter and tuning experiments along the way. What’s worse is that the resultant implementation is brittle to change, and this time only delays overall time to solution.

With the WSE, there is no bottleneck. We give you a cluster-scale AI compute resource with the programming ease of a single desktop machine using stock PyTorch. Spend your time in AI discovery, not cluster engineering.

The future of AI
is Wafer-Scale

The Wafer-Scale Advantage

The Cerebras WSE-3 surpasses all other processors in AI-optimized cores, memory speed, and on-chip fabric bandwidth.

more AI compute cores

Compute Designed for AI

more on-chip memory

Memory Capacity and Bandwidth: Why Choose?

more fabric bandwidth

High Bandwidth. Low Latency.

Cluster-Scale Performance on a Single Chip

Subscribe to the newsletter and stay updated about our latest innovations.

Follow

Applications

Industries

Resources

Developers

Company

The future of AIis Wafer-Scale

The Wafer-Scale Advantage

The Cerebras WSE-3 surpasses all other processors in AI-optimized cores, memory speed, and on-chip fabric bandwidth.

more AI compute cores

Compute Designed for AI

more on-chip memory

Memory Capacity and Bandwidth: Why Choose?

more fabric bandwidth

High Bandwidth. Low Latency.

Cluster-Scale Performance on a Single Chip

Subscribe to the newsletter and stay updated about our latest innovations.

Follow

Product

Applications

Industries

Resources

Developers

Company

The future of AI
is Wafer-Scale