The Difficulty of Models in Different Domains

I think it's great that we have a platform for trying to answer these interesting research questions. I guess when I read these papers one question I keep thinking of is we know that we're building harder datasets and that's obvious given that the models are not doing very well but it's kind of hard to pinpoint why exactly these newer benchmarks are hard right. For example in your third round in the adversarial analyte pipeline you said you sample data from diverse domains which were different from the domain city sample from in the first two rounds. Is it because the reasoning involved in these newer examples hard or is it just that the domain is different and so the models are just not trained enough in the new

Play episode from 10:39

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app