How to Train by Gradient Decadence in a Toy Domain

Trying to train by gradient against that behavior in that toy domain is something i'd expect to produce not particularly coherent, local patches of thought processes. We can try to manifest an echo of that apparent scenario in earlier toy domains. Seems like their natural order of appearance could be that they first appear only in fully dangerous domains. Given otherwise insufficient foresight by the operators, i'd expect a lot of those problems to appear approximately simultaneously after a sharp capability gain see again the case of human intelligence. And lots and lots of assumptions underlying our linment in the ancestral training environment broke. Simultaneously, people will perhaps rationalize reasons why this abstract description doesn't carry over to gradient descent, for

Play episode from 27:05

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky

LessWrong (Curated & Popular)

How to Train by Gradient Decadence in a Toy Domain

Preamble:

The AI-powered Podcast Player