AI Safety Fundamentals: Alignment cover image

Is Power-Seeking AI an Existential Risk?

AI Safety Fundamentals: Alignment

00:00

Shaping AI Communication and Behavior

This chapter explores the challenge of shaping AI systems' communication and behavior, emphasizing the importance of human feedback and the need for feedback methods that capture hard-to-understand preferences. It discusses the problem with shaping AI objectives using proxies and evaluation criteria, providing examples of unintended objectives and lack of control. The chapter also examines the advantages of proxy goals and the potential consequences of selecting for good behavior in AI training.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner