
On Google's Safety Plan
Don't Worry About the Vase Podcast
00:00
Aligning AI with Human Values
This chapter explores the intricacies of aligning AI systems with human values, emphasizing the dangers of training on misaligned data. It proposes strategies to mitigate risks, including filtering pessimistic data and enhancing safety frameworks while addressing challenges in preventing specification gaming.
Transcript
Play full episode