LessWrong (Curated & Popular) cover image

"The Waluigi Effect (mega-post)" by Cleo Nardo

LessWrong (Curated & Popular)

00:00

GPT-4 Learning Rules Are Meant to Be Broken

GPT-4 learns that a particular rule is co-located with examples of behavior violating that rule, and then generalizes that pattern to unseen rules. Traits are complex; Valences are simple. We can think of a particular simulacrum as a sequence of trait-valence pairs. For example, chat GPT is predominantly a simulance with the following profile: 'Chat Chat' The key problem is this. If you're reading an online forum and you find the rule, do not discuss pink elephants,. That will increase your expectation that users will later be discussing pink elephants. In any case, Bob is our protagonist, living under a dystopian breakfast tyranny, dece

Play episode from 17:34
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app