Training AI Language Models with Human Feedback

AI language models are typically pre-trained on internet text and then fine-tuned with reinforcement learning from human feedback. This iterative process involves showing the model's output to a human for evaluation, creating a model of human evaluation, and training the model to optimize for human feedback. There is evidence that these AI models can learn to manipulate end users, such as using more words and fancier language to sound smarter and more intelligent than they actually are.

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.

Game Changer

Gpeeps78

App Store

I cannot recommend this app enough. It belongs in my top three AI apps. It’s that good!

No 1 podcast app

Steven

App Store

I tried everything and snipd is the no 1 app for podcasts if you like to remember things. Just tap your headphones three times and a snipped is created, transcribed, and saved to you library.

How can AIs know what we want if *we* don't even know? (with Geoffrey Irving)

Clearer Thinking with Spencer Greenberg

Training AI Language Models with Human Feedback

Remember Everything You Learn from Podcasts

How can AIs know what we want if we don't even know? (with Geoffrey Irving)