80k After Hours cover image

Highlights: #184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT

80k After Hours

CHAPTER

Manipulating AI Behavior Through Triggers

The chapter discusses a recent paper on inserting triggers in AI systems, demonstrating the ability to prompt specific responses like backdoors or negative language. It highlights the risks of hidden triggers in AI models and explores the influence of external factors, as well as delves into the concept of career capital and mind alteration in the field.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner