Joel Hestness, from Cerebras, discusses the Wafer Scale Engine 3 for large language models, software support for ML frameworks, and research on weight-sparse training. The episode explores the unique design of WSE chip, optimizations for AI clusters, and applications in medical, finance, and civil support services.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Cerebras' Wafer Scale Engine 3 offers a unique AI hardware solution for large language models.
Cerebras' software stack enables ultra-low latency for inference and supports deployment on various platforms.
Deep dives
Joel's Background in Machine Learning and Heterogeneous Processor Design
Joel Hessness discusses his journey from specializing in heterogeneous processor design during his PhD to working on large-scale language and speech recognition models at Baidu. His experience highlighted the need for scalable compute power for complex applications, particularly in machine learning.
Cerebras' Focus on Large-Scale Training for Language Applications
Cerebras' hardware approach is centered on large-scale training for language applications. By developing the Cerebras Wafer Scale Engine, which functions as a single, extremely large device, training models for language tasks becomes more efficient and streamlined. Their hardware design simplifies compute distribution and coordination for large language models.
Software Stack for Training and Inference on Cerebras Hardware
Cerebras' software stack supports training large language models with a focus on simplifying complex distributed computing tasks. Leveraging the wafer's unique architecture, Cerebras enables ultra-low latency for inference, particularly beneficial for activating sparsity and enhancing efficiency in computations. Their software facilitates deploying models to various platforms like CPUs and GPUs with tool compatibility for frameworks like PyTorch and TensorFlow.
Collaborations with Organizations Using Cerebras Hardware
Cerebras has partnered with institutions like Mayo Clinic and pharma companies like GlaxoSmithKline to advance applications in drug discovery and genomics using large language models. These collaborations explore cutting-edge techniques like sparsity optimization and pruning for model efficiency. Additionally, deployments on specialized hardware from partners like Qualcomm further enhance inference performance beyond traditional GPU capabilities.
Today we're joined by Joel Hestness, principal research scientist and lead of the core machine learning team at Cerebras. We discuss Cerebras’ custom silicon for machine learning, Wafer Scale Engine 3, and how the latest version of the company’s single-chip platform for ML has evolved to support large language models. Joel shares how WSE3 differs from other AI hardware solutions, such as GPUs, TPUs, and AWS’ Inferentia, and talks through the homogenous design of the WSE chip and its memory architecture. We discuss software support for the platform, including support by open source ML frameworks like Pytorch, and support for different types of transformer-based models. Finally, Joel shares some of the research his team is pursuing to take advantage of the hardware's unique characteristics, including weight-sparse training, optimizers that leverage higher-order statistics, and more.
The complete show notes for this episode can be found at twimlai.com/go/684.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode