LessWrong (Curated & Popular)

"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.

Oct 23, 2023
Simon Lermen and Jeffrey Ladish discuss LoRA fine-tuning and its impact on safety training. They explore the effectiveness of safety procedures, the QloRA technique, dark topics of slurs and brutal killings, effects of model size on harmful task performance, a hypothetical plan for AI attack and control, and the analysis of refusals and comparison of instruction sets.
Ask episode
Chapters
Transcript
Episode notes