The Importance of Predicting Refusal Responses

We have a refusal classifier trained on a small sample set about 2000. And then we feed it in the other 10,000 and it classifies those automatically. That is able to predict whether Chag GPT will refuse a prompt with 76% accuracy. Do you find that that's near optimal or is there room for improvement if more time and energy were invested? I don't know what I would say is the optimal percentage accuracy that I could do as a human.

Play episode from 25:38

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app