Erik Jones

PhD candidate at Berkeley, focusing on robustness and alignment in large language models. His research explores automated methods for identifying model failure modes.

Best podcasts with Erik Jones

Ranked by the Snipd community

Aug 11, 2023 • 23min

Erik Jones on Automatically Auditing Large Language Models

Erik Jones, a PhD candidate at Berkeley, examines how to enhance the safety and alignment of large language models. He discusses his innovative paper on automatically auditing these models, exploring the vulnerabilities they face from adversarial attacks. Erik shares insights on the importance of discrete optimization and how it can reveal hidden model behaviors. He also delves into the implications of using language models for sensitive topics and the need for automated auditing methods to ensure reliability and robustness in AI systems.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app