Super Data Science: ML & AI Podcast with Jon Krohn cover image

915: How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Attacks on Language Models: Understanding Risks and Vulnerabilities

This chapter explores the technical intricacies of executing attacks on language models, emphasizing the differences between white box and black box models. It discusses various vulnerabilities, including adversarial attacks and prompt stealing, while highlighting the importance of defense strategies and continuous evaluation. Additionally, the chapter addresses the misuse of these models for extracting sensitive information and the implications of such actions.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app