AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Measure Academic Data Set Performance
Evaluation of these models is incredibly hard for a bunch of different reasons. The only reliable measure is actually putting it in front of people and asking them which models you prefer, but two models in front of them. You can just measure that in academic data set performance. If you throw a base model at these academic data sets and you throw a command model, command model is going to perform just leaks leaks better, dramatically better. How much effort is this step? Is it like comparable in cost to training the base model or like how long does it take? Like how expensive is it? It's really hard to get right.