Oxide and Friends cover image

Adversarial Machine Learning

Oxide and Friends

00:00

Exploring Adversarial Tactics on Language Models

The chapter delves into the transition from attacking image recognition models to exploring adversarial tactics on language models. It discusses the applications of multi-modal models, challenges of adapting adversarial techniques to language models, and the vulnerability of models to subtle perturbations. The conversation also touches on training anti-models, manipulating models to produce specific responses in chat models, and the strategy of optimizing inputs with the word 'sure' to prompt negative or harmful responses.

Play episode from 03:40
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app