
On Adversarial Training & Robustness with Bhavna Gopal
Thinking Machines: AI & Philosophy
Understanding Instincts: The Limits of Human Explainability
Human behavior and language model responses share similarities in their decision-making processes, often operating on instinct rather than conscious reasoning. When prompted to explain their actions, individuals often provide retroactive justifications that might not accurately reflect their intuitive understanding. Much like humans can be deceived by optical illusions, their perceptions can be influenced by factors beyond conscious awareness. This highlights the complexity of achieving true explainability, as many decisions stem from subconscious connections formed through repeated experiences.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.