The Usability of Chat GPT: The Magic Ingredient of RLHF

1min Snip

00:00

Play full episode

Summary

Transcript

Episode notes

Chat GPT is a powerful model that becomes more usable with reinforcement learning and human feedback. The model learns from text data and can do amazing things, but initially it's not easy to use. RLHF, or reinforcement learning with human feedback, is the magic ingredient that aligns the model with human preferences. By showing two outputs and asking for feedback, the model improves with remarkably little data.

Sam Altman is the CEO of OpenAI, the company behind GPT-4, ChatGPT, DALL-E, Codex, and many other state-of-the-art AI technologies. Please support this podcast by checking out our sponsors:
– NetSuite: http://netsuite.com/lex to get free product tour
– SimpliSafe: https://simplisafe.com/lex
– ExpressVPN: https://expressvpn.com/lexpod to get 3 months free

EPISODE LINKS:
Sam’s Twitter: https://twitter.com/sama
OpenAI’s Twitter: https://twitter.com/OpenAI
OpenAI’s Website: https://openai.com
GPT-4 Website: https://openai.com/research/gpt-4

PODCAST INFO:
Podcast website: https://lexfridman.com/podcast
Apple Podcasts: https://apple.co/2lwqZIr
Spotify: https://spoti.fi/2nEwCF8
RSS: https://lexfridman.com/feed/podcast/
YouTube Full Episodes: https://youtube.com/lexfridman
YouTube Clips: https://youtube.com/lexclips

SUPPORT & CONNECT:
– Check out the sponsors above, it’s the best way to support this podcast
– Support on Patreon: https://www.patreon.com/lexfridman
– Twitter: https://twitter.com/lexfridman
– Instagram: https://www.instagram.com/lexfridman
– LinkedIn: https://www.linkedin.com/in/lexfridman
– Facebook: https://www.facebook.com/lexfridman
– Medium: https://medium.com/@lexfridman

OUTLINE:
Here’s the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time.
(00:00) – Introduction
(08:41) – GPT-4
(20:06) – Political bias
(27:07) – AI safety
(47:47) – Neural network size
(51:40) – AGI
(1:13:09) – Fear
(1:15:18) – Competition
(1:17:38) – From non-profit to capped-profit
(1:20:58) – Power
(1:26:11) – Elon Musk
(1:34:37) – Political pressure
(1:52:51) – Truth and misinformation
(2:05:13) – Microsoft
(2:09:13) – SVB bank collapse
(2:14:04) – Anthropomorphism
(2:18:07) – Future applications
(2:21:59) – Advice for young people
(2:24:37) – Meaning of life