"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

Dodging Latent Space Detectors: Obfuscated Activation Attacks with Luke, Erik, and Scott.

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

CHAPTER

Defending Against Obfuscation Attacks

This chapter explores the vulnerabilities of latent-based defenses in large language models against obfuscation attacks, emphasizing their necessity and evolution. It examines various attack methods and the effectiveness of defense strategies, including stress-testing models to identify harmful outputs. The dialogue advocates for empirical approaches to defense mechanisms while reflecting on past research and findings related to model activation insights.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner