Get the app
Erik Jones
PhD candidate at Berkeley, focusing on robustness and alignment in large language models. His research explores automated methods for identifying model failure modes.
Best podcasts with Erik Jones
Ranked by the Snipd community
Aug 11, 2023
• 23min
Erik Jones on Automatically Auditing Large Language Models
chevron_right
Erik Jones, a PhD candidate at Berkeley, examines how to enhance the safety and alignment of large language models. He discusses his innovative paper on automatically auditing these models, exploring the vulnerabilities they face from adversarial attacks. Erik shares insights on the importance of discrete optimization and how it can reveal hidden model behaviors. He also delves into the implications of using language models for sensitive topics and the need for automated auditing methods to ensure reliability and robustness in AI systems.
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
Get the app