The Nonlinear Library

AF - Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller by Henry Cai

Jun 16, 2024
Henry Cai, author of a paper on self-controlling LLM behaviors, discusses using suffix gradients to modify model behaviors effectively. Topics range from exploring dinosaur noises, resisting petting a cat, and reasoning exercises to improving self-control by compressing suffix gradients into a prefix controller for LLMs, emphasizing representation engineering and gradient control.
Ask episode
Chapters
Transcript
Episode notes