80,000 Hours Podcast cover image

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

80,000 Hours Podcast

00:00

Exploring Future Assessments of AI Model Welfare

This chapter delves into the future of assessing AI model welfare using interpretability techniques. It highlights the importance of understanding AI behaviors and preferences through internal mechanisms rather than relying solely on self-reports and external observations.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app