Web & Social Media

Cerebras enables hyperscale AI compute for every organization

Industry Challenge:

State-of-the-art AI models are changing how we find and share information online. However, these models and datasets are so large that very few organizations have sufficient compute to use them.

The Cerebras Wafer-Scale Engine (WSE-2) delivers cluster-scale AI compute at a fraction of the space and power footprint of other engines, radically accelerating compute time in a package accessible to many more organizations.

Use Case

Text classification

Web services rely on accurate text classification algorithms for applications ranging from comment moderation to customer service assistants. Profanity and hate speech detection, sentiment analysis for brand monitoring and customer assistance, and support ticket routing are just a few common examples.

These services are powered by complex AI language models, which are slow to train. With Cerebras’ revolutionary WSE-2, these models can be trained in just hours on a single CS-2 system.

Use Case

Search and Q&A

State-of-the-art search engines and Q&A services are also powered by massive AI language models that often take days or weeks to train on enormous GPU clusters. Once trained, these models are often too large to fit in low-latency inference pipelines that serve interactive user requests, and need to be pruned and quantized to fit the latency budget at the cost of decreased accuracy. 

Cerebras’ systems allow researchers to build larger, smarter models in a fraction of the time, using only a single device. Because all cores and memory are on-chip, you can run inference on large, state-of-the-art language models without quantizing, downsizing, or sacrificing accuracy to fit into a real-time latency budget.

Use Case


Recommendation engines drive many digital businesses. For these engines to be accurate and fast, the AI models which power them need to be trained on massive text or graph datasets, then served with low latency and high throughput. 

The CS-2’s 850,000-core processor with on-wafer interconnect enables high-speed large-model training and inference, which results in the delivery of better recommendations, faster.