
Collin Burns
Researcher specializing in weak-to-strong generalization
Best podcasts with Collin Burns
Ranked by the Snipd community

Mar 26, 2024 • 35min
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Guest Collin Burns discusses weak-to-strong generalization in AI alignment, exploring fine-tuning strong models with labels from weaker models to enhance performance. Techniques like auxiliary confidence loss show promise in improving weak-to-strong generalization, suggesting progress in aligning superhuman models with human supervision.