The Importance of Optimizing Twitter Constraints

GPT-3 will give the most probable answer or if we say we're talking to some smart debater, it will do the most obvious smart argument. But even RHF models like reinforcement learning using human feedback models who are not specifically optimized still have this behavior which is just they favor a lot more the key arguments and forget a bit about sub arguments.

Play episode from 15:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app