

#38735
Mentioned in 1 episodes
ZeroBench
A Visual Reasoning Benchmark for Large Multimodal Models
Book • 2000
ZeroBench is a lightweight, challenging visual reasoning benchmark designed to evaluate the capabilities of large multimodal models.
It consists of 100 manually curated questions and 334 subquestions, focusing on complex reasoning over images.
The benchmark is entirely impossible for current frontier models, making it a valuable tool for future model development.
It consists of 100 manually curated questions and 334 subquestions, focusing on complex reasoning over images.
The benchmark is entirely impossible for current frontier models, making it a valuable tool for future model development.
Mentioned by
Mentioned in 1 episodes
Mentioned by 

as an impossible benchmark for VLLMs, where current models score zero.


Alex Volkov

15 snips
📆 ThursdAI - Feb 20 - Live from AI Eng in NY - Grok 3, Unified Reasoners, Anthropic's Bombshell, and Robot Handoffs!