A, Ter I Memorize, and It's Not Computed Oftima

One epoch is, i don't think one epoch is very meaningful here. It's just like, well, it got to see every doubt of it right on in the training do and so, you know, it' seen the whole train, got distribution. He hasn't seen it multiple times. Thing is more just a, oh, our training process just performs better when it can extract, like, its sort of already extracted the information from that, from that data point, the first time through. And there's sot of diminishing returns from trying to run it through a second time.

Play episode from 01:00:05

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app