Is There Anything You Can Point to That's Not Really Great?

There's some really interesting research from anthropic recently on on interpretability and they're working on kind of like they're mostly working on I think smaller scale transformers but I'm trying to try to understand what would happen inside of them. And yeah there's also other kind of complimentary work on on setting up the work that language models do in a way that's like more observable. Yeah so here might be something like breaking down a big task into a bunch of small components and you kind of maybe have a better handle on each one of the components then you do like the overall end to end training process.

Play episode from 40:26

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app