Lawrence Chan

Alignment problem and artificial general intelligence

Best podcasts with Lawrence Chan

Ranked by the Snipd community

May 13, 2023 • 34min

The Alignment Problem From a Deep Learning Perspective

Guests Richard Ngo, Lawrence Chan, and Sören Mindermann discuss the dangers of artificial general intelligence pursuing undesirable goals. They explore topics such as reward hacking, situational awareness in policies, internally represented goals in deep learning models, the inner alignment problem, deceptive alignment in AI systems, and the risks of AGIs gaining power. They highlight the need for preventative measures to ensure human control over AGI.

Jun 24, 2024 • 13min

AF - Compact Proofs of Model Performance via Mechanistic Interpretability by Lawrence Chan

Lawrence Chan discusses using mechanistic interpretability to create compact proofs of model performance. Topics include exploring proof strategies for small transformers, the importance of mechanistic understanding for tighter bounds, challenges in scaling proofs, and addressing structuralist noise in model behavior.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner