Right now I'm thinking of applying BIMP to pre-trained large language models. So hopefully the swapping would be most valuable for, you know, at the end of the training. We'll see how it works on language models. But since you were born up at the point, I got a little bit concerned. Maybe I tend to agree with you that maybe swapping is most valuable when there are like many lottery tickets directions you can go. After you branch into that facing of attraction or lottery ticket, so to speak, swapping becomes no longer important because you're already in that basin. You don't need to select which basin. It's just my conjecture.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode