
"The Waluigi Effect (mega-post)" by Cleo Nardo
LessWrong (Curated & Popular)
00:00
GPT-4 Learning Rules Are Meant to Be Broken
GPT-4 learns that a particular rule is co-located with examples of behavior violating that rule, and then generalizes that pattern to unseen rules. Traits are complex; Valences are simple. We can think of a particular simulacrum as a sequence of trait-valence pairs. For example, chat GPT is predominantly a simulance with the following profile: 'Chat Chat' The key problem is this. If you're reading an online forum and you find the rule, do not discuss pink elephants,. That will increase your expectation that users will later be discussing pink elephants. In any case, Bob is our protagonist, living under a dystopian breakfast tyranny, dece
Play episode from 17:34
Transcript


