Last Week in AI cover image

#213 - Midjourney video, Gemini 2.5 Flash-Lite, LiveCodeBench Pro

Last Week in AI

00:00

Evaluating Reasoning in Language Models

This chapter explores a new benchmark for assessing the reasoning capabilities of language models, focusing on their ability to abstain from answering under uncertainty. It introduces the 'abstention bench' and highlights the ongoing challenges faced by large language models, especially in math and science contexts, while presenting Minimax M1 as a promising solution.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app