
Erik Jenner
Lead author of "Obfuscated Activations Bypass Large Language Model Latent-Based Defenses," contributing to the understanding of AI safety and latent space defenses.
Best podcasts with Erik Jenner
Ranked by the Snipd community

31 snips
Jan 18, 2025 • 2h 10min
Dodging Latent Space Detectors: Obfuscated Activation Attacks with Luke, Erik, and Scott.
Luke Bailey and Eric Jenner, both leading experts on AI safety, dive into their research on obfuscated activation attacks. They dissect methods for bypassing latent-based defenses in AI while examining the vulnerabilities these systems face. The conversation highlights complex topics like backdoor attacks, the importance of diverse datasets, and the ongoing challenge of enhancing model robustness. Their work sheds light on the cat-and-mouse game between attackers and defenders, making it clear that the future of AI safety is as intricate as it is essential.

Dec 12, 2024 • 24min
38.3 - Erik Jenner on Learned Look-Ahead
Erik Jenner, a third-year PhD student at UC Berkeley's Center for Human Compatible AI, dives into the fascinating world of neural networks in chess. He explores how these AI models exhibit learned look-ahead abilities, questioning whether they strategize like humans or rely on clever heuristics. The discussion also covers experiments assessing future planning in decision-making, the impact of activation patching on performance, and the relevance of these findings to AI safety and X-risk. Jenner's insights challenge our understanding of AI behavior in complex games.