Search

i'mable clarified the terminology. I guess all these techniques are referred to as noral architecture. Ours is actually a direct back prop solution. When we evaluate this at inference time, it actually does give a significant benefit. We were able to compress a model roughly tenx. So basically, the architecture with the minimum loss was point four, four, four two. And we compressed that by 15 point one x, and it increased the loss to point 4, four, five four.

Play episode from 14:51

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app