Artificial General Intelligence (AGI) Show with Soroush Pour cover image

Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT)

Artificial General Intelligence (AGI) Show with Soroush Pour

CHAPTER

The Significance of Adversarial Training and Scoping in AI Models

Exploring the importance of adversarial training in AI models, the chapter discusses how making models robust through techniques like latent adversarial training can enhance resilience against unforeseen challenges. It emphasizes the significance of scoping methods to limit model capabilities and prevent undesirable outcomes, highlighting the use of targeted lead and adversarial training for unlearning bad behaviors. The chapter also delves into the application of targeted latent adversarial training to defend against specific threats in AI models, aiming to enhance model defenses against various risks.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner