
Introduction to AI Control
AI Safety Fundamentals
00:00
Trusted Monitoring and Editing
The episode details using a weaker trusted model to monitor and edit outputs from stronger untrusted models, with Redwood Research results.
Transcript
Play full episode


