EAG Talks cover image

EAG Talks

Lessons from reinforcement learning from human feedback | Stephen Casper | EAG Boston 23

Nov 23, 2023
55:40

Reinforcement Learning from Human Feedback (RLHF) has emerged as the central alignment technique used to finetune state-of-the-art systems such as GPT-4, Claude-2, Bard, and Llama-2. However, RLHF has a number of known problems, and these models have exhibited some troubling alignment failures. How did we get here? What lessons should we learn? And what does it mean for the next generation of AI systems? Stephen is a third year Computer Science Ph.D student at MIT in in the Algorithmic Alignment Group advised by Dylan Hadfield-Menell. Formerly, he has worked with the Harvard Kreiman Lab and the Center for Human-Compatible AI. His main focus is on interpreting, diagnosing, debugging, and auditing deep learning systems.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode