Unpacking AI Models and Behavioral Manipulation

Exploring methods to manipulate AI models and adjust their biases using techniques like causal tracing. Delving into the inner workings of neural networks, particularly large language models, to understand and regulate their complex behaviors.

Play episode from 13:28

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app