80k After Hours cover image

Highlights: #184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT

80k After Hours

00:00

Manipulating AI Behavior Through Triggers

The chapter discusses a recent paper on inserting triggers in AI systems, demonstrating the ability to prompt specific responses like backdoors or negative language. It highlights the risks of hidden triggers in AI models and explores the influence of external factors, as well as delves into the concept of career capital and mind alteration in the field.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app