The Struggle to Get JI Safety Models to Not Say Toxic Things

There's kind of a straw man version of the set of views that people hold around a JI safety like if you speak to most people who work who actually work in this field I think they're a lot closer to what you're talking about. We're trying to work out how do we get these models to not say toxic things because that is an example of an objective that we have and we're struggling to get that into the model. There is obviously some people who are either not very good at communicating or have sort of views towards the like while the end of this spectrum that I think it can be tempting to sort of use that as the as the default but there are a lot of

Play episode from 01:01:16

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app