AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Big Bend Benchmark - What Are the Challenges for Language Models?
Big Bend is a benchmark that aims to measure the capabilities of large language models. It includes about 200 tasks, I think, that were created by the research community. They are pretty diverse and cover everything from arithmetic task to social biases. Strategy QA just fulfilled these two criteria for inclusion in Big Bend.