
915: How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Attacks on Language Models: Understanding Risks and Vulnerabilities
This chapter explores the technical intricacies of executing attacks on language models, emphasizing the differences between white box and black box models. It discusses various vulnerabilities, including adversarial attacks and prompt stealing, while highlighting the importance of defense strategies and continuous evaluation. Additionally, the chapter addresses the misuse of these models for extracting sensitive information and the implications of such actions.
Transcript
Play full episode