
"SolidGoldMagikarp (plus, prompt generation)"
LessWrong (Curated & Popular)
00:00
Introduction
This is an audio version of Solid Gold Magikarp plus Prompt Generation by Jessica Rumbolo and M. Watkins, published on the 6th of February 2023. Cross posted from the AI Alignment forum may contain more technical jargon than usual. Work done at Siri Mats, over the past two months by JessicaRumbolo and Matthew Watkins. A new interpretability method for language models, which reliably finds prompts that result in a target completion. Anomalous tokens, a mysterious failure mode for GPT, which reliably insulted Matthew. The instruct models quote are particularly deranged and quote in this context,. Many of these tokens reliably break determinism in the open AI GPT-3 playground at
Play episode from 00:00
Transcript


