Thinking Machines: AI & Philosophy cover image

On Adversarial Training & Robustness with Bhavna Gopal

Thinking Machines: AI & Philosophy

NOTE

Understanding Instincts: The Limits of Human Explainability

Human behavior and language model responses share similarities in their decision-making processes, often operating on instinct rather than conscious reasoning. When prompted to explain their actions, individuals often provide retroactive justifications that might not accurately reflect their intuitive understanding. Much like humans can be deceived by optical illusions, their perceptions can be influenced by factors beyond conscious awareness. This highlights the complexity of achieving true explainability, as many decisions stem from subconscious connections formed through repeated experiences.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner