
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
The Oversight Team, Episode and Policy, and Change in Parameters
The point of this approach is to create extremely reliable ai, where it will never engage in certain types of behavior. A practice problem is to get any kind of behaviour extremely reliably out of current da l elems. The way redwood operationalize this is by trying to train an l l m to have the property that they finish the prompt such that no humans get hurt.
Play episode from 01:12:47
Transcript


