For Humanity: An AI Safety Podcast cover image

Is AI Alive? | Episode #66 | For Humanity: An AI Risk Podcast

For Humanity: An AI Safety Podcast

00:00

Exploring Mechanistic Interpretability and Deception in AI

This chapter examines how mechanistic interpretability allows for the manipulation of features in AI systems, revealing their impact on model responses. Through a partnered company's system, the speakers demonstrate real-time adjustments related to deception, discussing the broader implications for understanding AI's approach to subjective consciousness.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app