3min snip

"Moment of Zen"  cover image

The AI moment, AI vs crypto: a heated debate with Amjad Masad, Flo Crivello, and Nathan Labenz

"Moment of Zen"

NOTE

How LLMs and RLHF works

Large language models, LMs for short, are trained through a process called pre-training where the model learns to predict the next word or token by analyzing the entire corpus of the internet. This pre-training step results in the emergence of intelligence in the model. Reinforcement learning from human feedback (RLHF) is a newer approach where an optimizing function is created to align the AI's predictions with human preferences. In RLHF, the AI is given instructions and if it follows those instructions, it is rewarded. If it deviates from the instructions, it is punished. This process is repeated to make the AI listen and learn from human feedback.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode