#38735
Mentioned in 1 episodes

ZeroBench

A Visual Reasoning Benchmark for Large Multimodal Models
Book • 2000
ZeroBench is a lightweight, challenging visual reasoning benchmark designed to evaluate the capabilities of large multimodal models.

It consists of 100 manually curated questions and 334 subquestions, focusing on complex reasoning over images.

The benchmark is entirely impossible for current frontier models, making it a valuable tool for future model development.

Mentioned by

Mentioned in 1 episodes

Mentioned by
undefined
Alex Volkov
as an impossible benchmark for VLLMs, where current models score zero.
15 snips
📆 ThursdAI - Feb 20 - Live from AI Eng in NY - Grok 3, Unified Reasoners, Anthropic's Bombshell, and Robot Handoffs!

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app