AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How Do You Reduce the Efficiency of Your Model?
The routing to experts is not random. It's a learned process that learns to assign a given example based on some embedding of that example, the correct quote unquote expert. You can also think of the reverse process where you do parameter sharing. This would create a model that's much smaller in terms of memory footprint because you're just using one layer of your parameters. But your runtime is much more expensive because each layer is run multiple times. So you end up with a potentially much slower model. There's hardly ever a perfect approach that wins on off front.