Latent Space: The AI Engineer Podcast cover image

NeurIPS 2023 Recap — Best Papers

Latent Space: The AI Engineer Podcast

00:00

Evaluating Language Models' Reasoning and Planning

This chapter presents COG-Eval, a benchmark designed to systematically assess the reasoning and planning capabilities of language models through cognitive mapping and task measurement. It explores the limitations of these models, particularly in navigating complex graph structures, and highlights the operational efficiencies of advanced neural networks. The discussion includes insights on stability in machine learning models through Linear Time Invariant systems and the implications of signal processing techniques.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app