AXRP - the AI X-risk Research Podcast cover image

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

AXRP - the AI X-risk Research Podcast

00:00

Navigating Control and Alignment in AI Safety

The chapter explores the challenges of ensuring AI behaves safely by comparing control and alignment approaches in AI safety. It discusses how control lenses can manage adversarial strategies of AI models, debates the effectiveness of adversarial training in ensuring alignment, and proposes overseeing AI in various contexts to prevent problematic behaviors. The chapter also emphasizes the importance of evaluating AI capabilities, implementing control protocols, and using red teams to simulate potential threats in AI systems.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app