AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

00:00

The Basic Algorithm of a Transformer

The algorithm has three steps one of them is mapping a and b to the sign w a. It's actually really easy because you don't need to learn the general function sign you see to learn sign on 113 memorized values yep note that I studied mod 113 though the same algorithm seems to transfer to everything else we've checked. Lawrence made a great diagram of the solid paper people should get a look at it cool yeah. The hard part of the algorithm is multiplying together the like trig term for a in the trig term for b to get the a plus b rotation, he says.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app