The Importance of General Benchmarking of Language Models

Jasper: I think one thing that was kind of also interesting from this section and maybe it's a little bit negative is they kind of critique some of these other small language models. And if you end up kind of extending the benchmarks, or you include more complicated reasoning tasks in the benchmarks, actually chat GPT ends up performing better than these smaller models. Jasper: How do you benchmark languages for copywriting like what are the data sets for that? It's just very hard. This is obviously true at Harvey with, you know, we're trying to benchmark like language models for legal performance both with specific legal domains but then legal in general as well.

Play episode from 05:56

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app