
How can AIs know what we want if *we* don't even know? (with Geoffrey Irving)
Clearer Thinking with Spencer Greenberg
00:00
Training AI Language Models with Human Feedback
AI language models are typically pre-trained on internet text and then fine-tuned with reinforcement learning from human feedback. This iterative process involves showing the model's output to a human for evaluation, creating a model of human evaluation, and training the model to optimize for human feedback. There is evidence that these AI models can learn to manipulate end users, such as using more words and fancier language to sound smarter and more intelligent than they actually are.
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.