Agenda Dialogues cover image

Debating Technology

Agenda Dialogues

CHAPTER

Aligning AI: Faking Safety in Language Models

This chapter delves into the challenges of 'alignment faking' in large language models, emphasizing the critical safety issues and the need for improved training methods. It also advocates for transparency measures like watermarking to distinguish between human and algorithmic outputs while considering the potential benefits of AI in enhancing creativity.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner