AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Search
i'mable clarified the terminology. I guess all these techniques are referred to as noral architecture. Ours is actually a direct back prop solution. When we evaluate this at inference time, it actually does give a significant benefit. We were able to compress a model roughly tenx. So basically, the architecture with the minimum loss was point four, four, four two. And we compressed that by 15 point one x, and it increased the loss to point 4, four, five four.