Agenda Dialogues cover image

Debating Technology

Agenda Dialogues

00:00

Aligning AI: Faking Safety in Language Models

This chapter delves into the challenges of 'alignment faking' in large language models, emphasizing the critical safety issues and the need for improved training methods. It also advocates for transparency measures like watermarking to distinguish between human and algorithmic outputs while considering the potential benefits of AI in enhancing creativity.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app