
Connor Leahy on AI Safety and Why the World is Fragile
Future of Life Institute Podcast
Encoding Human Preferences and the Ontology of Weight Shifts
Reinforcement learning from human feedback. This is a commonly used technique at the moment, which I have strong technical disagreements with. The problem here is that you don't really know which goals you're encoding in the agent. You don't know how the agent this or how the AI model is understanding your thumbs down or thumbs up. That's kind of like how early Jeff is.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.