Interconnects cover image

Interviewing Louis Castricato of Synth Labs and Eleuther AI on RLHF, Gemini Drama, DPO, founding Carper AI, preference data, reward models, and everything in between

Interconnects

00:00

Unpacking Large-Scale Model Training

This chapter explores the intricate technicalities of training a 70 billion parameter model, focusing on GPU utilization, node requirements, and the complexities of Proximal Policy Optimization (PPO). It discusses the evolution of Reinforcement Learning from Human Feedback (RLHF), highlighting innovative advancements and the challenges faced in scalable environments. Personal anecdotes intertwined with technical dialogue emphasize the collaborative efforts and evolving frameworks that shape the future of machine learning.

Play episode from 01:12:19
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app