Tool Use - AI Conversations cover image

Practical AI Safety (ft Kyle Clark)

Tool Use - AI Conversations

00:00

Why Post-Training Safety Has Limits

Kyle describes how attention mechanisms let jailbreaks succeed by stuffing irrelevant tokens and then slipping overrides into context.

Play episode from 08:28
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app