Interpolation Hyper Parameter

Do you think it would make sense to consider making the interpolation hyper parameter be more context dependent? I know you have static image, just decided once and static throughout in the model. And I think there's some work from, from CMU on efficient nearest neighbor language models where they did try to use a small network on top of BLM to predict the interpolation parameters. They saw that like it massively improves efficiency without sacrificing too much, too much in terms of performance.

Play episode from 14:15

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app