How to Train Models That Are Better on Blue Benchmark?

In a recent talk, you were talking basically about how having these benchmarks out and then having all these, you know, hundreds of people trying to work on it to make it better. And you talked about how it was similar basically to like,you know, p hacking, right? Like if you run a hundred different experiments, yeah, you're going to see that something is correlated with something else, but it not doesn't mean it necessarily is a meaningful relationship between the two. Yeah. So now a lot of papers and models reports say, yeah, we just train our free train models on short sequences. But sadly your application is, I don't know, human classification, which is

Play episode from 27:57

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app