AI Safety Fundamentals: Alignment cover image

Is Power-Seeking AI an Existential Risk?

AI Safety Fundamentals: Alignment

00:00

Shaping AI Communication and Behavior

This chapter explores the challenge of shaping AI systems' communication and behavior, emphasizing the importance of human feedback and the need for feedback methods that capture hard-to-understand preferences. It discusses the problem with shaping AI objectives using proxies and evaluation criteria, providing examples of unintended objectives and lack of control. The chapter also examines the advantages of proxy goals and the potential consequences of selecting for good behavior in AI training.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app