Beyond Big Chips: Cerebras on Inference and AI

52 snips

May 13, 2025

Andrew Feldman, Co-founder and CEO of Cerebras Systems, dives into the fascinating evolution from giant chips to cutting-edge inference clouds. He discusses how owning the entire tech stack can dramatically outperform GPU clusters. Feldman shares insights on recent collaborations with Meta and IBM, the impact of the WSE-3 upgrade, and new data centers aimed at delivering faster, cost-effective AI solutions. He also addresses the challenges and future potential of AI technologies in a competitive landscape.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Shift to Fastest Inference Cloud

Cerebras shifted focus from building the biggest chip to running the fastest inference cloud as AI moved from novelty to utility.
The explosion of inference demand drove their roadmap to deliver fast, cost-effective AI inference services.

ANECDOTE

Major Meta and IBM Partnerships

Meta partnered with Cerebras to serve open-weight models through an API, enhancing model availability to their large developer community.
IBM also selected Cerebras to provide inference services via WatsonX AI cloud platform for enterprise customers.

INSIGHT

Memory Bandwidth Drives Inference Speed

GPU inference typically uses small 8-GPU setups with limited memory bandwidth and poor performance.
Cerebras' WaferScale Engine has 7,000x more memory bandwidth, enabling 40 to 70 times faster inference than GPUs.

Get the Snipd Podcast app to discover more snips from this episode

Get the app