Super Data Science: ML & AI Podcast with Jon Krohn cover image

915: How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Exploring Vulnerabilities in Language Models and Their Evaluation

This chapter explores strategic methods for obtaining sensitive information from language models, focusing on the reduced costs of such tactics as inference improves. It also introduces the SORI benchmark, designed to assess models' weaknesses against various threats, including political bias and coercion.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app