Explicitly Breaking Out GPT-N as an Anchor

Short horizon inefficiently trained coding models operating pretty close to their training distributions of massively accelerated AI research. I'm now explicitly putting significant weight on an amount of compute that's more like just scaling up language models to brainish sizes. This is consistent with doing RL fine tuning, but just needing many OOMs less data for that than for the original training run. And I think that's the most likely way it would manifest.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app