The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678

103 snips

Apr 1, 2024

Jonas Geiping, a research group leader at the ELLIS Institute and Max Planck Institute, sheds light on his groundbreaking work on coercing large language models (LLMs). He discusses the alarming potential for LLMs to engage in harmful actions when misused. The conversation dives into the evolving landscape of AI security, exploring adversarial attacks and the significance of open models for research. They also touch on the complexities of input optimization and the balance between safeguarding models while maintaining their functionality.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

LLM Attacks Now Feasible

Adversarial attacks were previously difficult to implement on large language models.
Open-source models and new optimization algorithms enabled research and attacks.

INSIGHT

LLM Agents and Security

Current LLMs are not secure enough for real-world agent systems.
Someone can manipulate an LLM agent to perform any action, good or bad.

INSIGHT

Open Models and Security Research

Open-source models allow researchers to study and understand LLM attacks.
This research is crucial for evaluating attacks and developing defenses.

Get the Snipd Podcast app to discover more snips from this episode

Get the app