The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678

103 snips
Apr 1, 2024
Jonas Geiping, a research group leader at the ELLIS Institute and Max Planck Institute, sheds light on his groundbreaking work on coercing large language models (LLMs). He discusses the alarming potential for LLMs to engage in harmful actions when misused. The conversation dives into the evolving landscape of AI security, exploring adversarial attacks and the significance of open models for research. They also touch on the complexities of input optimization and the balance between safeguarding models while maintaining their functionality.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

LLM Attacks Now Feasible

  • Adversarial attacks were previously difficult to implement on large language models.
  • Open-source models and new optimization algorithms enabled research and attacks.
INSIGHT

LLM Agents and Security

  • Current LLMs are not secure enough for real-world agent systems.
  • Someone can manipulate an LLM agent to perform any action, good or bad.
INSIGHT

Open Models and Security Research

  • Open-source models allow researchers to study and understand LLM attacks.
  • This research is crucial for evaluating attacks and developing defenses.
Get the Snipd Podcast app to discover more snips from this episode
Get the app