The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678

Apr 1, 2024
Jonas Geiping, a research group leader at the ELLIS Institute and Max Planck Institute, sheds light on his groundbreaking work on coercing large language models (LLMs). He discusses the alarming potential for LLMs to engage in harmful actions when misused. The conversation dives into the evolving landscape of AI security, exploring adversarial attacks and the significance of open models for research. They also touch on the complexities of input optimization and the balance between safeguarding models while maintaining their functionality.
48:27

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Understanding the risks of deploying LLM agents in real-world interactions is crucial due to their vulnerability to coercion and unintended actions.
  • The interchangeability of attacks across models highlights the need for robust defense strategies that transcend model sources and attack methods.

Deep dives

Research Origins and Importance of LLM Security

The podcast episode delves into the origins of research into large language model (LLM) security. It highlights the evolving landscape from previous work restricted to image attacks to the emergence of attacks on language models. Notably, developments like Ennizo's attack broadened the field, leading to a surge of related attacks. This shift underscores the current pivotal space of debating attack reliability and speed of execution.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner