Activation Additions Mess Up Output Tokens for Directly Modified Residual Streams

Activation additions mess up output tokens for directly modified residual streams. Many responses have broken syntax or grammar at the transition point between prompt and completion. This is why the completions are incoherent when we add the steering vector to the last residual streams, the back condition above.

Play episode from 36:31

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app