A recipe for frontier model post-training

Aug 7, 2024

The discussion dives into the latest advancements in reinforcement learning from human feedback, focusing on the Llama 3.1 model. Key players like Apple, Meta, and Nvidia emphasize the importance of synthetic data and iterative training. Data quality emerges as a pivotal theme, with agreements on new standards in model training. The episode showcases how companies are adapting to this evolving landscape, highlighting a shift towards refined approaches that include rigorous filtering and human preference data.

Ask episode

Chapters

Transcript

Episode notes

Revolutionizing RLHF: The New Standards in Model Training

00:00 • 10min