Encoding Human Preferences and the Ontology of Weight Shifts

Reinforcement learning from human feedback. This is a commonly used technique at the moment, which I have strong technical disagreements with. The problem here is that you don't really know which goals you're encoding in the agent. You don't know how the agent this or how the AI model is understanding your thumbs down or thumbs up. That's kind of like how early Jeff is.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app