LessWrong (Curated & Popular) cover image

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

00:00

The Oversight Team, Episode and Policy, and Change in Parameters

The point of this approach is to create extremely reliable ai, where it will never engage in certain types of behavior. A practice problem is to get any kind of behaviour extremely reliably out of current da l elems. The way redwood operationalize this is by trying to train an l l m to have the property that they finish the prompt such that no humans get hurt.

Play episode from 01:12:47
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app