
#367 – Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI
Lex Fridman Podcast
How RLHF Aligns Machine Learning Models to Human Preferences
Language models trained on text data have a lot of knowledge but can be difficult to use. RLHF aligns the model with human preferences by gathering feedback and using reinforcement learning. The result is a much more user-friendly and effective tool.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.