How to Optimize Your Chatbot

The GPT four is a fine average of one label. That's pretty much the best you're going to get off like and Turk train judges, and it works across fluency accuracy other things. We found that if you ever try to compare two things, the outputs don't work as well. It's better to reverse the problem and say, how can I evaluate what's failing in product which users are getting bad results? The chatbot being rude is kind of the most trivial example of this bus. Yeah, to be direct on that that we're calling this critic modeling and it works extremely well, especially with the most recent models. And then potentially using some like weak supervision signal to

Play episode from 28:50

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app