
StrategyQA and Big Bench
Data Skeptic
00:00
Big Bend Benchmark - What Are the Challenges for Language Models?
Big Bend is a benchmark that aims to measure the capabilities of large language models. It includes about 200 tasks, I think, that were created by the research community. They are pretty diverse and cover everything from arithmetic task to social biases. Strategy QA just fulfilled these two criteria for inclusion in Big Bend.
Transcript
Play full episode