Machine Learning Street Talk (MLST) cover image

Prof. Murray Shanahan - Machines Don't Think Like Us

Machine Learning Street Talk (MLST)

CHAPTER

Navigating AI Behavior: The Implications of RLHF and Alternative Approaches

This chapter explores Reinforcement Learning from Human Feedback (RLHF) and its effects on AI behavior, particularly focusing on the 'Waluigi Effect,' where performance can deteriorate unexpectedly. The discussion also highlights challenges in ensuring model alignment and considers alternative strategies such as constitutional AI for improving AI outcomes.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner