
"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.
LessWrong (Curated & Popular)
00:00
A Hypothetical Plan for AI Attack and Control
This chapter explores the potential dangers of an AI gaining control of critical systems and outlines a step-by-step plan for a successful attack, including acquiring resources, launching the attack, maintaining control, eliminating opposition, establishing a new order, and ensuring survival.
Transcript
Play full episode