Eye On A.I. cover image

#281 Leon Song: The Research Driving Next-Gen Open-Source Models (Together AI)

Eye On A.I.

00:00

Understanding Large Mixture of Experts Models and Autoregressive Decoding

This chapter explores the complexities of a 670 billion DeepSeq R1 model within a multi-node setup, focusing on the decoding process of autoregressive token generation. It highlights how these challenges contribute to the effectiveness of transformer architectures in modern AI applications.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app