Machine Learning Street Talk (MLST) cover image

Prof. Murray Shanahan - Machines Don't Think Like Us

Machine Learning Street Talk (MLST)

00:00

Navigating AI Behavior: The Implications of RLHF and Alternative Approaches

This chapter explores Reinforcement Learning from Human Feedback (RLHF) and its effects on AI behavior, particularly focusing on the 'Waluigi Effect,' where performance can deteriorate unexpectedly. The discussion also highlights challenges in ensuring model alignment and considers alternative strategies such as constitutional AI for improving AI outcomes.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app