How Do You Reduce the Efficiency of Your Model?

The routing to experts is not random. It's a learned process that learns to assign a given example based on some embedding of that example, the correct quote unquote expert. You can also think of the reverse process where you do parameter sharing. This would create a model that's much smaller in terms of memory footprint because you're just using one layer of your parameters. But your runtime is much more expensive because each layer is run multiple times. So you end up with a potentially much slower model. There's hardly ever a perfect approach that wins on off front.

Play episode from 15:43

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app