
Debating Technology
Agenda Dialogues
Aligning AI: Faking Safety in Language Models
This chapter delves into the challenges of 'alignment faking' in large language models, emphasizing the critical safety issues and the need for improved training methods. It also advocates for transparency measures like watermarking to distinguish between human and algorithmic outputs while considering the potential benefits of AI in enhancing creativity.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.