Oxide and Friends cover image

Adversarial Machine Learning

Oxide and Friends

CHAPTER

Exploring Adversarial Tactics on Language Models

The chapter delves into the transition from attacking image recognition models to exploring adversarial tactics on language models. It discusses the applications of multi-modal models, challenges of adapting adversarial techniques to language models, and the vulnerability of models to subtle perturbations. The conversation also touches on training anti-models, manipulating models to produce specific responses in chat models, and the strategy of optimizing inputs with the word 'sure' to prompt negative or harmful responses.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner