
915: How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Exploring Vulnerabilities in Language Models and Their Evaluation
This chapter explores strategic methods for obtaining sensitive information from language models, focusing on the reduced costs of such tactics as inference improves. It also introduces the SORI benchmark, designed to assess models' weaknesses against various threats, including political bias and coercion.
Transcript
Play full episode