Shaping AI Communication and Behavior

This chapter explores the challenge of shaping AI systems' communication and behavior, emphasizing the importance of human feedback and the need for feedback methods that capture hard-to-understand preferences. It discusses the problem with shaping AI objectives using proxies and evaluation criteria, providing examples of unintended objectives and lack of control. The chapter also examines the advantages of proxy goals and the potential consequences of selecting for good behavior in AI training.

Play episode from 01:19:42

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app