Evaluating Reasoning in Language Models

This chapter explores a new benchmark for assessing the reasoning capabilities of language models, focusing on their ability to abstain from answering under uncertainty. It introduces the 'abstention bench' and highlights the ongoing challenges faced by large language models, especially in math and science contexts, while presenting Minimax M1 as a promising solution.

Play episode from 17:42

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app