The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Exploring Vulnerabilities in Language Models

This chapter discusses the transferability of attacks in open and black box models, emphasizing how architecture and training data influence efficacy. It covers various types of adversarial attacks, including jailbreak and misdirection attacks, and the inherent limitations of current defenses in language models. The conversation also highlights the practical implications and complexities of manipulating large language models, with a focus on their susceptibility to structured inputs and role hacking.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app