Tech Disruptors

Beyond Big Chips: Cerebras on Inference and AI

26 snips
May 13, 2025
Andrew Feldman, Co-founder and CEO of Cerebras Systems, dives into the fascinating evolution from giant chips to cutting-edge inference clouds. He discusses how owning the entire tech stack can dramatically outperform GPU clusters. Feldman shares insights on recent collaborations with Meta and IBM, the impact of the WSE-3 upgrade, and new data centers aimed at delivering faster, cost-effective AI solutions. He also addresses the challenges and future potential of AI technologies in a competitive landscape.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Shift to Fastest Inference Cloud

  • Cerebras shifted focus from building the biggest chip to running the fastest inference cloud as AI moved from novelty to utility.
  • The explosion of inference demand drove their roadmap to deliver fast, cost-effective AI inference services.
ANECDOTE

Major Meta and IBM Partnerships

  • Meta partnered with Cerebras to serve open-weight models through an API, enhancing model availability to their large developer community.
  • IBM also selected Cerebras to provide inference services via WatsonX AI cloud platform for enterprise customers.
INSIGHT

Memory Bandwidth Drives Inference Speed

  • GPU inference typically uses small 8-GPU setups with limited memory bandwidth and poor performance.
  • Cerebras' WaferScale Engine has 7,000x more memory bandwidth, enabling 40 to 70 times faster inference than GPUs.
Get the Snipd Podcast app to discover more snips from this episode
Get the app