How to Train a System to Succeed

The goal here is not to have the system like stress tests our cyber security or something although we should also do that separately I think that's like another effort right. The goal is really just like how close are the systems that we currently training or that we currently have to a system like that that would be deceptively aligned there is a coherent layer that any chance it gets where things humans aren't looking will run specific code that's exactly what we're looking for and apro-aree they can set up the experiment so that it's hard to do that but you can very well measure whether the system succeeds yeah.

Play episode from 01:00:02

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app