Exploring Mechanistic Interpretability and Deception in AI

This chapter examines how mechanistic interpretability allows for the manipulation of features in AI systems, revealing their impact on model responses. Through a partnered company's system, the speakers demonstrate real-time adjustments related to deception, discussing the broader implications for understanding AI's approach to subjective consciousness.

Play episode from 01:32:20

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app