The Inside View cover image

[JUNE 2022] Aran Komatsuzaki on Scaling, GPT-J and Alignment

The Inside View

00:00

NLP Benchmarks for Language Models

I think benchmarks are quite tricky, especially for language models and an LP where if you just like change the beginning of a word. Even human judgment sometimes is not very useful because humans do not really understand whether the generated text is factual or not. So in that case, yeah, this can be a big problem. I think for short term, we can probably manage to make up some nice benchmark using human evaluations. And then, but at some point, even training models to evaluate model is not enough because at the point, the models are so much greater than human talent.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app