
"SolidGoldMagikarp (plus, prompt generation)"
LessWrong (Curated & Popular)
00:00
Optimized Inputs to Optimized Outputs
GPT-2XL is a complex flowchart. It has an input which is labeled optimize input and an output which is labeled to maximized output logic for target class. To the right of this diagram we have some examples of what kind of optimized inputs result from this process. For girl we have a mix of nonsense and real words. We're not optimizing for realistic inputs, but rather for inputs that maximize the output probability of the target completion shown in bold above. That is the words girl, woman, good and doctor in the four examples we just heard.
Play episode from 05:12
Transcript


