With convolutional neural networks, interpretability had many good advances. We can kind of understand the different layers have different sort of ranges that they're looking at. With large language models, it's perhaps a little more complicated. But I think it's still achievable in the sense that we could kind of ask, well, what kind of prompts this degenerate if I sort of drop out this part of the network? And sort of start getting at a language to even describe these types of aspects of human behavior or psychology from the spoken part in the language bar.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode