The Importance of Evaluating Large Language Models

The biggest thing is evaluation. What are these models actually for? And I think often people just will say very general things. Once you do all this it starts to be less exciting because you're going down into the details. On a purely intellectual level you can say well it's really impressive that it worked 90% of the time but on a practical level is that good enough? If you still need to go through and check everything because it's giving you garbage time of the time is that really a useful product? Is that going to save you time? Or is that actually going to increase your workload?

Play episode from 09:32

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app