The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

CHAPTER

Exploring Vulnerabilities in Language Models

This chapter discusses the transferability of attacks in open and black box models, emphasizing how architecture and training data influence efficacy. It covers various types of adversarial attacks, including jailbreak and misdirection attacks, and the inherent limitations of current defenses in language models. The conversation also highlights the practical implications and complexities of manipulating large language models, with a focus on their susceptibility to structured inputs and role hacking.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner