18min chapter

Artificial General Intelligence (AGI) Show with Soroush Pour cover image

Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT)

Artificial General Intelligence (AGI) Show with Soroush Pour

CHAPTER

The Significance of Adversarial Training and Scoping in AI Models

Exploring the importance of adversarial training in AI models, the chapter discusses how making models robust through techniques like latent adversarial training can enhance resilience against unforeseen challenges. It emphasizes the significance of scoping methods to limit model capabilities and prevent undesirable outcomes, highlighting the use of targeted lead and adversarial training for unlearning bad behaviors. The chapter also delves into the application of targeted latent adversarial training to defend against specific threats in AI models, aiming to enhance model defenses against various risks.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode