
Highlights: #184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT
80k After Hours
Manipulating AI Behavior Through Triggers
The chapter discusses a recent paper on inserting triggers in AI systems, demonstrating the ability to prompt specific responses like backdoors or negative language. It highlights the risks of hidden triggers in AI models and explores the influence of external factors, as well as delves into the concept of career capital and mind alteration in the field.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.