Pre-training vs RL and reward hacking

Ilya contrasts pre-training's broad data with RL's targeted environments and how researcher incentives can skew training.

Play episode from 02:57

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Ilya & I discuss SSI’s strategy, the problems with pre-training, how to improve the generalization of AI models, and how to ensure AGI goes well.

Watch on YouTube; read the transcript.

Sponsors

* Gemini 3 is the first model I’ve used that can find connections I haven’t anticipated. I recently wrote a blog post on RL’s information efficiency, and Gemini 3 helped me think it all through. It also generated the relevant charts and ran toy ML experiments for me with zero bugs. Try Gemini 3 today at gemini.google

* Labelbox helped me create a tool to transcribe our episodes! I’ve struggled with transcription in the past because I don’t just want verbatim transcripts, I want transcripts reworded to read like essays. Labelbox helped me generate the exact data I needed for this. If you want to learn how Labelbox can help you (or if you want to try out the transcriber tool yourself), go to labelbox.com/dwarkesh

* Sardine is an AI risk management platform that brings together thousands of device, behavior, and identity signals to help you assess a user’s risk of fraud & abuse. Sardine also offers a suite of agents to automate investigations so that as fraudsters use AI to scale their attacks, you can use AI to scale your defenses. Learn more at sardine.ai/dwarkesh

To sponsor a future episode, visit dwarkesh.com/advertise.

Timestamps

(00:00:00) – Explaining model jaggedness

(00:09:39) - Emotions and value functions

(00:18:49) – What are we scaling?

(00:25:13) – Why humans generalize better than models

(00:35:45) – SSI’s plan to straight-shot superintelligence

(00:46:47) – SSI’s model will learn from deployment

(00:55:07) – How to think about powerful AGIs

(01:18:13) – “We are squarely an age of research company”

(01:20:23) – Self-play and multi-agent

(01:32:42) – Research taste