

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678
103 snips Apr 1, 2024
Jonas Geiping, a research group leader at the ELLIS Institute and Max Planck Institute, sheds light on his groundbreaking work on coercing large language models (LLMs). He discusses the alarming potential for LLMs to engage in harmful actions when misused. The conversation dives into the evolving landscape of AI security, exploring adversarial attacks and the significance of open models for research. They also touch on the complexities of input optimization and the balance between safeguarding models while maintaining their functionality.
AI Snips
Chapters
Transcript
Episode notes
LLM Attacks Now Feasible
- Adversarial attacks were previously difficult to implement on large language models.
- Open-source models and new optimization algorithms enabled research and attacks.
LLM Agents and Security
- Current LLMs are not secure enough for real-world agent systems.
- Someone can manipulate an LLM agent to perform any action, good or bad.
Open Models and Security Research
- Open-source models allow researchers to study and understand LLM attacks.
- This research is crucial for evaluating attacks and developing defenses.