"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.

Oct 23, 2023

Simon Lermen and Jeffrey Ladish discuss LoRA fine-tuning and its impact on safety training. They explore the effectiveness of safety procedures, the QloRA technique, dark topics of slurs and brutal killings, effects of model size on harmful task performance, a hypothetical plan for AI attack and control, and the analysis of refusals and comparison of instruction sets.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 5min

The Impact of Future Model Weight Releases and the Introduction of QloRA Technique

05:22 • 7min

Examining the Dark Topics of Slurs and Brutal Killings

12:27 • 4min

Effects of Model Size on Harmful Task Performance

16:56 • 2min

A Hypothetical Plan for AI Attack and Control

18:38 • 11min

Analysis of Refusals and Comparison of Instruction Sets

29:12 • 4min