AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is Distillation of Student Teacher Models the Best Practical Approach?
Right now we have a form of what we call focal distillation. Distillation is where you turn a moter than you train ad other motel to mimic the input on behaviour of the first. But that model could be smaller or could have other characteristics. You don't want to imitate the old model, because then you would inherit also the mistakes. It's an activara of research and we both read about methods that are coming up that improve on that.