Oxide and Friends cover image

Adversarial Machine Learning

Oxide and Friends

00:00

Exploring Adversarial Tactics on Language Models

The chapter delves into the transition from attacking image recognition models to exploring adversarial tactics on language models. It discusses the applications of multi-modal models, challenges of adapting adversarial techniques to language models, and the vulnerability of models to subtle perturbations. The conversation also touches on training anti-models, manipulating models to produce specific responses in chat models, and the strategy of optimizing inputs with the word 'sure' to prompt negative or harmful responses.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app