Transfer Learning

We're training an 11 billion perameter model on about a trillion tokens of text. That takes a ton of computation. But once you've already done that, you sort of admertize the cost and it makes it much cheaper for a practitioner to use the results. The expensive part is kind of paid ahead of time when you do the pre training.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app