

"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.
Oct 23, 2023
Simon Lermen and Jeffrey Ladish discuss LoRA fine-tuning and its impact on safety training. They explore the effectiveness of safety procedures, the QloRA technique, dark topics of slurs and brutal killings, effects of model size on harmful task performance, a hypothetical plan for AI attack and control, and the analysis of refusals and comparison of instruction sets.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Introduction
00:00 • 5min
The Impact of Future Model Weight Releases and the Introduction of QloRA Technique
05:22 • 7min
Examining the Dark Topics of Slurs and Brutal Killings
12:27 • 4min
Effects of Model Size on Harmful Task Performance
16:56 • 2min
A Hypothetical Plan for AI Attack and Control
18:38 • 11min
Analysis of Refusals and Comparison of Instruction Sets
29:12 • 4min