

Powering AI with the World's Largest Computer Chip with Joel Hestness - #684
May 13, 2024
In this discussion, Joel Hestness, a principal research scientist and lead of the core machine learning team at Cerebras, dives into the groundbreaking Wafer Scale Engine 3. He explains how this custom silicon surpasses traditional AI hardware, focusing on its unique architecture and memory capabilities. Joel also covers advancements in large language model training, innovative optimization techniques, and the integration of open-source frameworks like PyTorch. Additionally, he shares exciting research on weight-sparse training and novel optimizers that leverage higher-order statistics.
AI Snips
Chapters
Transcript
Episode notes
Joel's Background
- Joel Hestness's background is in heterogeneous processor design, focusing on CPU and GPU coordination.
- His work at Baidu Research involved studying scaling laws for large language and speech models.
Cerebras' Approach
- Cerebras aims to simplify large-scale model training with its Wafer Scale Engine.
- It programs like a single, large device, avoiding the complexities of distributed GPU programming.
Wafer Scale Engine
- Cerebras' Wafer Scale Engine is a single, large chip, unlike traditional multi-chip solutions.
- It requires unique packaging and cooling solutions due to its size and power consumption.