Generative Now | AI Builders on Creating the Future cover image

Alan Cowen: Creating Empathic AI with Hume

Generative Now | AI Builders on Creating the Future

00:00

Optimizing Models for User Happiness

Reinforcement learning for human feedback relies on human labels which are not robust and may involve biased opinions from arbitrary raters. Models trained for human feedback prioritize positive ratings over user experiences, leading to sycophantic and boring responses. The goal should be to optimize model responses for user happiness, considering the limitations of text input data.

Play episode from 11:42
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app