Vulnerabilities of Language Models

This chapter explores the inherent vulnerabilities of language models, particularly how non-toxic prompts can lead to toxic responses. It discusses critical research on toxicity in generative models, emphasizing the influence of training data and the risks of malicious alterations like weight and data poisoning. The findings raise important concerns about the integrity and deployment of these models in practical applications.

Play episode from 44:07

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app