AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Evaluating Large Language Models
The biggest thing is evaluation. What are these models actually for? And I think often people just will say very general things. Once you do all this it starts to be less exciting because you're going down into the details. On a purely intellectual level you can say well it's really impressive that it worked 90% of the time but on a practical level is that good enough? If you still need to go through and check everything because it's giving you garbage time of the time is that really a useful product? Is that going to save you time? Or is that actually going to increase your workload?